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ME THOD S AND COMPOSITIONS FOR 
IDENTIFYING OSTEOGENIC AGENTS 

Technical Fiel d 

The present invention relates to assay techniques for identifying agents which 
modulate bone growth. 

Background of the Invention 

Although there is a great deal of information available on the factors which 
influence the breakdown and resorption of bone, information on growth factors which 
stimulate the formation on growth fectors which stimulate the formation of new bone is 
more limited. Investigators have searched for sources of such activities and have found 
that bone tissue itself is a storehouse for fectors which have the capacity for stimulating 
bone cells. Thus, extracts of bovine tissue obtained from slaughterhouses contain not only 
structural proteins which are responsible for maintaining the structural integrity of bone, 
but also biologically active bone growth factors which can stimulate bone cells to 
proliferate. Among these latter factors are transforming growth factor p, the heparin- 
binding growth factors (acidic and basic fibroblast growth factor), the insulin-like growth 
factors (insulin-like growth factor I and insulin-like growth factor II) and a recently 
described family of proteins called bone morphogenetic proteins (BMPs). All of these 
growth fectors have effects on other types of cells as well as on bone cells. 

The BMPs are novel factors in the extended transforming growth factor 0 family. 
They were first identified in extracts of demineralized bone (Urist 1965, Wozney etal., 
1 988). Recombinant BMP-2 and BMP -4 can induce new bone formation when they are 
injected locally into the subcutaneous tissues of rats (Wozney 1992, Wozney & Rosen 
1993). These factors are expressed by normal osteoblasts as they differentiate, and have 
been shown to stimulate osteoblast differentiation and bone nodule formation in vitro as 
well as bone formation in vivo (Harris et aL, 1994). This latter property suggests potential 
usefulness as therapeutic agents in diseases which result in bone loss. 

The ceils which are responsible for forming bone are osteoblasts. As osteoblasts 
differentiate from precursors to mature bone-forming cells, they express and secrete a 
number of the structural proteins of the bone matrix including Type-1 collagen, osteocalcin, 
osteopontin and alkaline phosphates (Stein et aL, 1990, Harris et aL, 1994). They also 
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synthesize a number of growth regulatory peptides which are stored in the bone matrix and 
are presumably responsible for normal bone formation. These growth regulatory peptides 
include the BMPs (Harris et al, 1994). In studies of primary cultures of fetal rat calvarial 
osteoblasts, BMPs 1, 2, 3, 4, and 6 are expressed by cultured cells prior to the formation of 
5 mineralized bone nodules (Harris et al, 1994). Expression of the BMPs coincides with 
expression of alkaline phosphatase, osteocalcin and osteopontin. 

Although the BMPs have powerful effects to stimulate bone formation in vitro and 
in vivo, there are disadvantages to their use as therapeutic agents to enhance bone healing. 
Receptors for the bone morphogenetic proteins have been identified in many tissues, and 
1 0 the BMPs themselves are expressed in a large variety of tissues in specific temporal and 
spatial patterns. This suggests that they may have effects on many tissues other than bone, 
potentially limiting their usefulness a therapeutic agents when administered systematically. 
Moreover, since they are peptides, they would have to be administered by injection. These 
disadvantages are severe limitations to the development of BMPs as therapeutic agents. 

15 It is an object of the present invention to overcome the limitations inherent in 

known osteogenic agents by providing a method to identify potential drugs which would " 
stimulate production of BMPs locally in bone. 

Prior Art 

Sequence data on small fragments of the S'-flanking region of the BMP-4 gene have 
20 been published (Chen et al % 1 993; Kurihara et al, 1993), but the promoter has not been 
previously functionally identified or isolated. 

Disclosure of the Invention 

A cell-based assay technique for identifying and evaluating compounds which 
stimulate the growth of bone is provided, comprising culturing a host cell line comprising 

25 an expression vector comprising a DNA sequence encoding a promoter region of at least 
one bone morphogenetic protein, operatively linked to a reporter gene encoding an 
assayable product under conditions which permit expression of said assayable product, 
contacting the cultured cell line with at least one compound suspected of possessing 
osteogenic activity, and identifying osteogenic agents by their ability to modulate the 

30 expression of the reporter gene and thereby increase the production of the assayable 
product. 
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This assay technique specifically identifies osteogenic agents which stimulate bone 
cells to produce bone growth factors in the bone morphogenetic protein family. These 
osteogenic agents display the capacity to increase the activity of the promoters of genes of 
members of the BMP family and other bone growth factors normally produced by e.g. bone 
cells. 

Also provided in accordance with the present invention are isolated DNA sequences 
encoding a promoter region of at least one bone morphogenetic protein, and a system for 
identifying osteogenic agents comprising an expression vector comprising such promoter 
sequences operatively linked to a reporter gene encoding an assayable product, and means 
for detecting the assayable product produced a response to exposure to an osteogenic 
compound. 

Brief Description of the Drawings 

Figure 1 A graphically depicts a restriction enzyme map of mouse genomic BMP-4 
and a diagram of two transcripts. The mouse BMP-4 gene transcription unit is -7kb and 
contains 2 coding exons (closed boxes) and 3 non-encoding exons, labeled exons 1A, IB 
and 2. This 1 9kb clone has an -6kb 5 ' -flanking region and an -7kb 3 ' -flanking region. 
The diagram shows approximately 2.4kb of the 5' -flanking region, and a small region of 
the 3 ' -flanking region. The lower panel shows two alternative transcripts, of BMP-4. 
Both have the same exons 2, 3 and 4 but a different exon 1. Transcript A has exon 1A and 
transcript B has exon IB whose size was estimated according to RT-PCR and primer 
extension analysis in FRC cells; 

Figure IB depicts the DNA sequence of selected portions of mouse genomic BMP- 
4 (SEQ. ID NO. 1) and the predicted amino acid sequences of the identified coding exons 
(SEQ. ID NO. 2). The numbers on the right show the position of the nucleotide sequence 
and the bold numbers indicate the location of the amino acid sequence of the coding region. 
Most of the coding sequence is in exon 4. The end of the transcription unit was estimated 
based on a 1.8kb transcript. Primer 1 in exon 1A was used in RT-PCR analysis with Primer 
3 in exon 3. Primer 2 in exon IB was used in RT-PCR analysis with Primer 3. Primer Bl 
and B2 were used in primer extension reactions; 

Figure 1C portrays the sequence of the BMP-4 exon 1 A 5'-flanking region and 
potential response elements in the mouse BMP-4 1 A promoter (SEQ. ID NO. 3). The 
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sequences of 2688 bp of the mouse BMP-4 gene are shown. Nucleotides are numbered on 
the left with +1 corresponding to the major transcription start site of the 1 A promoter. The 
response elements of DR-1 A Proximal and DR-1 A Distal oligonucleotides are indicated. 
The other potential response DNA elements in the boxes are p53, RB (retinoblastoma), SP- 
5 1, AP-1, and AP-2. Primer A, indicated by the line above the DNA sequence at +1 14 to 
+96, was used for primer extension analysis of exon 1 A-containing transcripts; 

Figure 2 depicts the results of a primer extension assay. Total RNAs prepared from 
FRC cells (on the left frame) and mouse embryo 9.S days (on the right) were used with 
primer A or the complement of primer 2. Two major extended fragments, 67 and 1 15 bp, 
10 indicated a lane A were obtained from primer A. Two IB primers, primer B 1 and primer 
B2, also gave negative results with both FRC and mouse embryo total RNA as template. 
Transcript B is not detectable with this assay. By RT-PCR, transcript B can be detected 
and quantified; 

Figure 3 A is a photographic representation of gel electrophoresis of 1 A-3 and 1B-3 
1 5 RT-PCR products of the BMP-4 gene. RT-PCR was performed with two pairs of primers 
using FRC cell poly A + mRNA as the template. The products were verified by the DNA " 
sequence; 

Figure 3B is a schematic diagram of spliced BMP-4 RT-PCR products with 1 A and 
IB exons in FRC cells. RT-PCR was performed with two pairs of primers using FRC cell 
20 poly A* mRNA as the template. The diagram shows where the primers are located in the 
BMP-4 genomic DNA. RT-PCR product 1A-2-3 which contains exon 1 A, exon 2 and the 
5' region of exon 3, was produced with primer 1 and primer 3. Primer 2 and primer 3 
generated two RT-PCR products with the exon 1B-2-3 pattern. The heterogeneity in size 
of exon IB is indicated. The 1A promoter is predominantly utilized in bone cells; 

25 Figure 4 A provides a map of the BMP-4 1A 5' -flanking-CAT plasmid and 

promoter activity in FRC cells. The 2.6kb EcoRi and Xba fragment, 1 .3 kb Pst fragment, 
0.5kb Sphl and Pst fragment, and 0.25kb PCR fragment were inserted into pBLCAT3. 
The closed box indicates the non-coding exon 1 A The CAT box represents the CAT 
reporter gene. The values represent percentages of CAT activity expressed by pC AT-2.6 

30 set at 100%. The values represent the average of four independent assays; 
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Figure 4B provides an autoradiogram of CAT assays using FRC cells transfected 
whh BMP-4 1 A 5 '-flanking-C AT plasmids identified in Figure 4A; 

Figure 5 portrays the nucleotid sequence of the mouse BMP-2 gene 5' -flanking 
region from -2736 to +139 (SEQ. ID NO. 4). The transcription start site is denoted by +1; 

5 Figure 6 A depicts an autoradiogram showing products of a primer extension assay 

for determination of the transcription start site of the BMP2 gene, separated on a 8% 
denaturing urea-polyaciylamide gel, in which Lane 1 : Total RNA from fetal rat calvarial 
osteoblast cells, and Lane 2: Control lane with lOjig of yeast tRNA All RNA samples 
were primed with a ^p-labeled oligonucleotide from exon 1 to the mouser BMP2 gene, as 
10 indicated in Figure 6B. Lane M: 32 p-hibeled Msgi digested X phage DNA, containing 
DNA fragments spanning from 623 bp to 15 bp (size marker); 

Figure 6B provides a schematic representation of the primer extension assay. The 
primer used is a 18mer synthetic oligonucleotide, 5 '-CCCGGCAAGTTCAAGAAG-3 ' 
(SEQ. ID NO. 5); 

15 Figure 7 provides a diagram of selected BMP-2 promoter - luciferase reporter 

constructs. BMP-2 5 * -flanking sequences are designated by hatched boxes (□) and 
luciferase cDNA is designated by the filled box (I). Base +1 14 denotes the 3' end of the 
BMP-2 gene in all the constructs; 

Figure 8 displays the luciferase enzyme activity for the BMP-2 gene-LUC 
10 constructs (shown in Figure 7) transfected in primary fetal rat calvarial osteoblasts (A), 
HeLa cells (B) and ROS 17/2.8 osteoblasts (C). The luciferase activity has been 
normalized to p-galactosidase activity in the cell lysates; 

Figure 9A-F depicts the DNA sequence of the mouse BMP-2 promoter and gene 
(SEQ. ID NO. 6); and 

5 Figure 10A-D depicts the DNA sequence of the mouse BMP-4 promoter and gene 

(SEQ. ID NO. 7) 

Figure 1 1 depicts the resequencing of the BMP-2 5' flanking region. 
Detailed Descriptio n of the Preferred Embodimen ts 
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A cell-based assay technique for identifying and evaluating compounds which 
stimulate the growth of bone is provided, comprising culturing a host cell line comprising 
an expression vector comprising a DNA sequence encoding a promoter region of at least 
one bone morphogenetic protein operatively linked to a reporter gene encoding an 
5 assayable product under conditions which permit expression of said assayable product, 
contacting the cultured cell line with at least one compound suspected of possessing 
osteogenic activity, and identifying osteogenic agents by their ability to modulate the 
expression of the reporter gene and thereby increase the production of the assayable 
product. 

10 The present invention is distinguished from other techniques for identifying bone- 

active compounds, as it specifically identifies chemical compounds, agents, factors or other 
substances which stimulate bone cells to produce the bone growth factors in the bone 
morphogenetic protein (BMP) family (hereinafter "osteogenic agents"). These osteogenic 
agents are identified by their capacity to increase the activity of the promoters of genes of 

1 5 members of the BMP family and other bone growth factors which are normally produced 
by bone cells, and other cells including cartilage cells, tumor cells and prostatic cells. When 
patients are treated with such chemical compounds, the relevant BMP will be produced by 
bone cells and then be available locally in bone to enhance bone growth or bone healing. 
Such compounds identified by this assay technique will be used for the treatment of 

20 osteoporosis, segmental bone defects, fracture repair, prosthesis fixation or any disease 
associated with bone loss. 

Compounds that inhibit bone morphogenetic protein expression in bone or cartilage 
may also be useful in clinical situations of excess bone formation which occurs in such 
diseases as osteoblastic metastases or osteosclerosis of any cause. Such compounds can 
25 also be identified in accordance with the present invention. 

Abo provided in accordance with the present invention are isolated DNA sequences 
encoding a promoter region of at least one bone morphogenetic protein, and a system for 
identifying osteogenic agents comprising an expression vector comprising such promoter 
sequences operatively linked to a reporter gene encoding an assayable product, and means 
30 for detecting the assayable product produced in response to exposure to an osteogenic 
compound. 
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The promoters of the genes for BMP-4 and BMP-2 are complex promoters which 
can be linked to reporter genes, such as e.g. the firefly luciferase gene. When the hybrid 
genes (for example, bone cell BMP-4 promoter or bone cell BMP-2 promoter and firefly 
luciferases, chloramphenicol acetyl transferase (CAT) cDNAs, or cDNA's for other 
5 reporter genes such as 0-galactosidase, green fluorescent protein, human growth hormone, 
alkaline phosphatase, ^-glucuronidase, and the like) are transfected into bone cells, 
osteogenic agents which activate the BMP-4 or BMP-2 promoters can be identified by their 
capacity in vitro to increase luciferase activity in cell lysates after cell culture with the 
agent. 

1 0 Sequence data on small fragments of the 5 '-flanking region of the BMP-4 gene have 

been published (Chen et al, 1993; Kurihara et al, 1993), but the promoter has not been 
previously identified or isolated, and methods for regulating transcription have not been 
shown. The present invention isolates the promoters for the BMP genes and utilizes these 
promoters in cultured bone cells so that agents could be identified which specifically 

15 increase BMP-2 or BMP-4 production locally in bone. Since h is known that the BMPs are 
produced by bone cells, a method for enhancing their production specifically in bone should 
avoid systemic toxicity. This benefit is obtained by utilizing the unique tissue specific 
promoters for the BMPs which are provided herein, and then using these gene promoters to 
identify agents which enhance their activity in bone cells. 

20 By utilizing the disclosure provided herein, other promoters can be obtained from 

additional bone morphogenetic proteins such as BMP-3, BMP-5, BMP-6, and BMP-7, to 
provide comparable benefits to the promoters herein specifically described. 

In addition, the present invention contemplates the use of promoters from additional 
growth factors in osteoblastic cells. Included are additional bone morphogenetic proteins, 
25 as well as fibroblast growth factors (e.g. FGF- 1 , FGF-2, and FGF-7), transforming growth 
factors 0-1, 0-2, and 0-3. insulin-like growth factor- 1, insulin-like growth factor-2, 
platelet-derived growth fiictor, and the like. Such promoters will readily be utilized in the 
present invention to provide comparable benefits. 

The cells which can be utilized in the present invention include primary cultures of 
30 fetal rat calvarial osteoblasts, established bone cell lines available commercially (MC3T3-E1 
cells, MG-63 cells, U20S cells, UMR106 cells, ROS 17/2.8 cells, SaOS2 cells, and the like 
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as provided in the catalog from the American Type Culture Collection (ATCC)), and bone 
cell lines established from transgenic mice, as well as other cell lines capable of serving as 
hosts for the present vectors and systems. In addition, a number of tumor cell lines also 
express BMPs, including the.prbstate cancer cell lines PC3, LNCAP, and DUI145, as well 
as the human cancer cell line HeLa. Thus, any of a number of cell lines will find use in the 
present invention and the choice of an appropriate cell line will be a matter of choice for a 
particular embodiment. 

The following examples serve to illustrate certain preferred embodiments and 
aspects of the present invention and are not to be construed as limiting the scope thereof 

EXPERIMENTAL 

In the experimental disclosure which follows, the following abbreviations apply: eq 
(equivalents); M (Molar); mM (millimolar); uM (micromolar); N (Normal); mol (moles); 
raraol (millimoles); umol (micromoles); nmol (nanomoles); kg (kilograms); gm (grams); mg 
(miffigrams); ug (micrograms); ng (nanograms); L (liters); ml (milliliters); ul (microliters); 
vol (volumes); and *C (degrees Centigrade). 

E«amplM : DESCRIPTION AND CHARACTERIZATION OF MURINE 
BMP-4 GENE PROMOTER 

(a) Library Screening, Cloning and Sequencing of Gene 

A mouse genomic lambda fix II spleen library (Stratagene, La Jolla, CA) was 
screened with a mouse embryo BMP-4 cDNA kindly provided by Dr. B.L.M. Hogan 
(Vanderbilt University School of Medicine, Nashville, TN). The probe was labeled with 
[a- 31 P]dCTP using a random-primer labeling kit from Boehringer-Mannheim (Indianapolis, 
IN). Plaque lift filters were hybridized overnight in 6X SSC, 5X Denhardfs. 0.5% SDS 
containing 200ug/ml sonicated salmon sperm DNA, lOug/ml Poly A and lOug/ml t-RNA at 
68° C. The filters were washed at 55° C for 20 min, twice in 2X SSC, 0. 1% SDS buffer, 
once in 0.5X SSC, 0. 1% SDS. The isolated phage DNA clones were analyzed according to 
standard procedures (Sambrook etal., 1989). 

Fragments from positive clones were subcloned into pBluescrpt vectors 
(Stratagene, La Jolla, CA) and sequenced in both directions using the Sequenase 
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dideoxynucleotide chain termination sequencing kit (U.S. Biochemical Corp., Cleveland, 
OH). 

Three clones were isolated from 2x10* plaques of mouse spleen 129 genomic library 
using full length coding region mouse embryo BMP-4 cDNA probe (B. Hogan, Vanderbilt 
University, Nashville, TN). One 19kb clone contained 5 exons and ~6kb 5 '-flanking region 
and a ~7kb 3 ' -flanking region, as shown in Figure 1 A. The 7kb transcription unit and the 
5 '-flanking region of the mouse BMP-4 gene were sequenced (Figure 10). 

The nucleotide sequence of selected portions of mouse BMP-4 and the deduced 
ammo acid sequence of the coding exons (408 residues; SEQ. ID NO. 2) is shown in Figure 
IB. Primers used in the RT-PCR experiments described below are indicated in this Figure. 

Figure 1C shows the DNA sequence of 2372bp of the 5' -flanking region and the 
candidate DNA response elements upstream of exon 1A. Primers used in primer extensions 
are also shown in Figures IB and 1 C. 

(b) Primer Extension Mapping of the Transcriptional Start-Site of the Mouse BMP-4 
Gene 

The transcriptional start-sites were mapped by primer extension using the synthetic 
oligonucleotide primer A 5 '-CGGATGCCGAACTCACCTA-3 ' (SEQ. ID NO. 8), 
corresponding to the complement of nucleotides +1 14 to +96 in the exon 1 A sequence and 
the oligonucleotide primer Bl 5'-CTACAAACCCGAGAACAG-3* (SEQ. ID NO. 9), 
corresponding to the complement of nucleotides +30 to +13 of the exon IB sequence. 
Total RNA from fetal rat calvarial (FRC) cells and 9.5 day mouse embryo (gift of B. 
Hogan, Vanderbilt University) was used with both primers. The primer extension assay 
was carried out using the primer extension kit from Promega (Madison, WI). The 
annealing reactions were, however, carried out at 60*C in a water bath for 1 hr. The 
products were then electrophoresed on 8% denaturing-urea poryacrylamide gels and 
autoradiographed. 

One additional oligonucleotide primer B2 5' -CCCGGCACGAAAGGAGAC-3 ' 
(SEQ. ID NO. 10), corresponding to the complement of nucleotide sequence +69 to +52 of 
exon IB, was also utilized in primer extension reactions with FRC and mouse embryo 
RNAs. 
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1 . Evidence for utilization of two alternate exon 1 sequences for the BMP-4 gene. 

Several BMP-4 cDNAs were sequenced from prostate cancer cell in PC-3 and from 
primary FRC cells. Four independent FRC cell BMP-4 cDNAs all contained exon I A. 
However, the human prostate carcinoma cell line (PC-3) cDNA contained an apparently 
5 unique exon IB sequence spliced to exon 2 (Chem et al y 1993). A doubt-stranded 

oligonucleotide roble (70bp) to exon IB was synthesized based on the human PC-3 exon 
IB sequence. This exon IB probe was then used to identify the exon IB region in the 
mouse genomic BMP-4 clone. The candidate exon IB is 1696bp downstream from the 3' 
end of exon 1A. 

10 2. Primer extension analysts 

Primer extension analysis was performed to map the mouse BMP-4 gene 
transcription start sites. Primer A, an oligonucleotide from exon 1 A, was used and two 
oligonucleotides from exon IB. Total RNA was utilized both from mouse embryo and 
FRC cells. As shown in Figure 2, a major extended fragment from primer A was obtained 

1 5 in both mouse embryo and FRC cell total RNAs, which migrates at 1 15bp. The extended 
5 '-end of the 1 1 5bp fragment represents the major transcription start site for 1 A-containing 
transcripts. The site of this 5* non-coding exon 1 A is 306bp. A major extended fragment 
from the complement of primer Bl (exon IB) was not detected using both mouse embryo 
and FRC cell total RNAs One other primer from exon IB also gave negative results, 

20 suggesting that in 9.5 day mouse embryo and FRC cells, the exon IB-containing transcripts 
were not detectable, which suggests that transcripts containing exon IB are less abundant 
in these cells and tissues than transcripts containing exon 1 A. All primer extensions were 
carried out after annealing of primers at high stringency. Lower stringency annealing with 
IB primers gave extended products not associated with BMP-4 mRNA. 

25 (c) BMP-4 Gene 5 ' Flanking Region for Exon 1 A and IB Transcripts. 

Four FRC BMP-4 cDNA were sequenced and found to contain exon 1 A sequences 
spliced to exon 2. The human U20S BMP-4 cDNA sequence also contains exon 1 A 
(Wozney et al, 1 988). This suggests the BMP-4 gene sequences upstream or exon 1 A are 
. used primarily in bone cells. 

30 To test whether the BMP-4 IB promoter is utilized at all in FRC cells, 

oligonucleotide primers were designed to ascertain whether spliced 1B-2-3 exon products 
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and 1A-2-3 exon (control) products could be obtained by more sensitive RT-PCR 
technique using FRC poly (A*)-RNA The 3* primer was in exon 3 (Figure IB - Primer 3) 
and the 5' primers were either in exon 1A (primer 1) or exon IB (primer 2). 

The RT-PCR products were cloned and sequenced. A photograph and diagram of 
5 the products obtained are presented in Figure 3 A and B. Both 1A-2-3 and IB-2-3 
products were obtained. The results indicate FRC osteoblasts produce transcripts with 
either 1 A exon or a IB exon, but not both. This suggests that the intron region between 
1 A and IB exons could contain regulatory response elements under certain conditions. Of 
the IB-2-3 RT-PCR products obtained from FRC osteoblasts, two products were obtained 

1 0 with different 3 ' splice sites for the exon IB. By comparison with the genomic DNA, both 
3' ends of the two exon IBs have reasonable 5' splice consensus sequences, consistent with 
an alternate splicing pattern obtained for the 1B-2-3 RT-FCR products. Most importantly, 
no 1 A-1B-2-3 RT-PCR splice products of the BMP-4 gene were obtained. Thus, IB does 
not appear to be alternatively spliced 5 '-non-encoding exon. By quantitative RT-PCR, it 

1 5 was shown that 1 A transcripts are 1 0 to 1 5X more abundant in primary bone cells. 

The technique of performing RT-PCR will be described. First-strand cDNA was " 
synthesized from 1 |ig FRC cell poly (A>RNA with an 1 8mer dT primer using 
Superscript™ reverse transcriptase (Gibco BRL) in a total volume of 20jil. The cDNA 
was then used as a template for PGR with two sets of synthesized primers. As shown in 

20 Figure IB, primer 1 (5 '-GAAGGCAAGAGCGCGAGG-3) (SEQ. ID No. 1 1), 

corresponding to a 3 ' region of exon 1 A and primer 3 ( 5 '-CCGGTCTC AGGTATC A-3 *) 
(SEQ. ID No. 12), corresponding to a 5' region of exon 3 were used to generate exon 1 A- 
2-3 spliced PCR product. Primer 2 (5 '-C AGGCGGAAAGCTGTTC-3 *) (SEQ. ID NO. 
13), corresponding to a 3 ' region (+2 to +18) of exon IB, and primer 3 were used to 

25 generate exon 1B-2-3 spliced PCR products. GeneAmp PCR kit was used according to the 
manufacturer's procedure (Perkin-Elmer/Cetus, Norwalk, CT). Each cycle consisted of a 
denaturation step (94°C for 1 min), an annealing step (59°C for 2 min) and an elongation 
step (72°C for 1 min). The PCR products were analysed by agarose gel electrophoresis for 
size determination. The products were subcloned into pCR II vector using TA cloning kit 

30 (InVitrogen, San Diego, CA). The inserts were sequenced in both directions with a 
sequencing kit from U.S. Biochemical (Cleveland, OH). 
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Northern analysis demonstrated that the single l.Skb BMP-4 transcript detected in 
FRC cells during bone cell differentiation hybridizes to both a pure 1 A exon probe and a 2- 
4 exons probe. The ratio of the 1 A to 2-4 signal is constant through the changing levels of 
BMP-4 expression during differentiation. Using a IB exon probe no detectable 
5 hybridization to the BMP-4 exon 2-4 1.8kb signal was observed. This again indicates that 
1 A containing transcripts predominate in bone cells, although IB transcripts can be 
detected by the more sensitive PCR method. By quantitative PCR it was shown that 1 A 
transcripts are 10-1 5X more abundant than IB in FRC cells. 

(d) BMP-4 Promoter 1 A Plasmid Construction and Transfection, and Detection of 
1 0 Promoter Activity in Osteoblasts. 

Three BMP-4 1 A promoter/plasmids were constructed by excising fragments from 
the 5* flanking region of the mouse BMP-4 gene and cloning into pBL3CAT expression 
vectors (Luckow and Schutz, 1987). The pCAT-2.6 plasmid was the pBLCAT3 vector 
with a 2.6kb EcoRl and Xba I fragment (-2372/+2S8) of the BMP-4 gene. The pCAT-1.3 

15 plasmid was similarly generated from a 1.3kb Pst fragment (-1 144/+212). The pCAT-0.5 
plasmid was made from a 0.5kb SphI and Pst fragment (-260/+212). Both the pCAT-1.3 - 
and the pCAT-0.5 plasmids have 212bp of exon 1A non-coding region. An additional 
promoter/plasmid was created from a PCR amplified product, corresponding to the 240bp 
sequence between nucleotides -25 and +212, and referred to as the pCAT-0.24. The 

20 amplified fragment was first cloned into pCR n vector using TA cloning kit (InVitrogen, 
San Diego, CA) and then the fragment was released with Hind HI and Xho I, and relegated 
into pBL3CAT. Correct orientation of all inserts with respect to the CAT vector was 
verified by DNA sequencing. 

The cells used for transient transfection studies were isolated from 19 day-old fetal 
25 rat calvariae by sequential digestion with trypsin and collagenase, as described by Bellows 
et al (1986) and Harris et al t (1994). In brie$ the calvarial bone were surgically removed 
and cleaned by washing in a minimal essential media (aMEM) containing 10% V/V fetal 
calf serum (FCS) and antibiotics. The bones were minced with scissors and were 
transferred to 35mm tissue culture dish containing 5ml of sterile bacterial collagenase 
30 (0. 1%) and trypsin 1 (0.05%). This was then incubated at 37°C for 20 min. The cells 

released at this time were collected and immediately mixed with an equal volume of FCS to 
inactivate trypsin. This procedure is repeated 6 times to release cells at 20 min intervals. 
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Cells released from 3rd, 4th, 5th and 6th digestion (enriched for osteoblasts) were 
combined and the cells are collected by centrifiigation at 40 Xg for 5 min. The cells were 
then plated in aMEM containing 10% FCS and antibiotics and were grown to confluency 
(2-3 days). At this stage the cells were plated for transfection in 60mm tissue culture 
5 dishes at a cell density of 5 x 10 5 cells per dish. These primary osteoblast cultures are 
capable of self-organizing into bone-like structure in prolonged cultures (Bellows et al, 
1986; Harris eta!, 1994). HeLa, ROS 17/2.8, and CV-1 cells were purchased from the 
ATCC. 

The isolated FRC cells, enriched for the osteoblast pbenotype, were used as 
0 recipient cells for transient transfection assays. BMP-4 mRNA is modulated in these cells 
in a transient fashion during prolonged cultured (Harris et al, 1994b). The technique of 
electroporation was used for DNA transfection (Potter, 1 988; van den Hoff et al, 1 992). 
After electroporation, the cells were divided into aliquots, replated in 100mm diameter 
culture dishes and cultured for 48 hours in modified Eagle's minimal essential media 
5 (MEM, GIB CO, Grand Island, NY) with 1 0% fetal calf serum (FCS). The extracts were 
assayed for CAT actively according to the method described by Gorman (1988) and CAT 
activity was normalized by fi-galactosidase assay according to the method of Rouet et al 
(1992). 

After 48 hrs of transfections with various BMP-4-CAT reporter gene plasmid 
constructs, the cells were harvested and the CAT activity was detennined. As indicated in 
Figure 4A and 4B, pCAT-0.24 plasmid (-25/+212) has little CAT activity. This plasmid 
contains -25 to +212 of the 5' non-coding exon 1A and was 3-fold lower that the parent 
PBL3CAT plasmid. The pCAT-0.5 (-260/+212), pCAT-1,3 (-1 144/+212), and pCAT-2.6 
(-2372Z+258) showed progressive increasing CAT activity when transfected into FRC cells. 
These data are shown in Figure 4B. With pCAT-0.5 (-260/+212) there is a 10-fold 
increase in CAT activity relative to pCAT-0.24 (-25/+212). pCAT-1.3 (-1144/+212) 
shows a further 6-fold increase and pCAT-2.6 (-2372/+258) shows further 2-fold change 
over pCAT-1 .3 (-1 144/+212). Thus the net increase in CAT activity between the pCAT- 
0.24 (+257/+212) and the pCAT-2.6 (-2372/+25S) in FRC cells is approximately 100-fold. 



Example 2: DESCRIPTION AND CHARACTERIZATION OF 
MURINE BMP-2 GENE PROMOTER 
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(a) Cloning of Mouse BMP-2 Genomic DNA. 

Genomic clones of the mouse BMP-2 gene were isolated in order to determine the 
transcriptional regulation of the BMP-2 gene in primary osteoblasts. 5 x 10 6 plaques were 
screened from a mouse genomic library, B6/CBA, (purchased from Stratagene, San Diego, 
CA) using BMP-2 cDNA as probe. The BMP-2 cDNA clone was isolated from a cDNA 
library of PC3 prostate cancer cells (Harris et al, 1994). The human BMP-2 probe was a 
1.1 kb Sma l fragment containing most of the coding region. 

The BmP-2 genomic clones were sequenced by dideoxy chain termination method 
(Sanger etal 9 1977), using deoxyadenosine 5'-[a[ 3$ S}thio] triphosphate and Sequenase 
(United States Biochemical, Cleveland, OH). All fragments were sequenced at least twice 
and overlaps were established using the appropriate oligonucleotie primer. Primers were 
prepared on an Applied Biosystems Model 392 DNA Synthesizer. Approximately 16kb of 
one of these BMP-2 clones was completely sequenced (Figure 9). Analysis of this 
sequence showed that the mouse BMP-2 gene contains one encoding and two coding exons 
(Feng et al 9 1994). Analysis of the 5' flanking sequence showed that the BMP-2 gene does 
not contain typical TATA oar CAAT boxes. However, a number of putative response 
elements and transcription factor recognition sequences were identified upstream of exon 1 
(Figure 5). The 5 '-flanking region is GC rich with several SP-1, AP-1 P53, E-box, 
homeobox, and AP-2 candidate DNA binding elements. 

(b) Analysis of Transcription Start Site for BMP-2 Gene. 

The transcription start sites for the BMP-2 gene were identified using the primer 
extension technique. Primer extension was carried out as described (Hall et aL, 1993). 
The primer used was a 32 p-labeled 18 mer oligonucleotide 5 -CCCGGC AATTCAAGAAG- 
3 ' (SEQ. ID NO> 5). Total RNA obtained from primary fetal rat calvarial osteoblasts, was 
used for the primer extension. The results were shown in Figure 6. The major extension 
product was 68bp and was used to estimate the major transportation start site (+1, Figure 
5). These results were confirmed by Rnase protection assays. 

(c) Identification of BMP-2 Promoter and Enhancer 
Activity Using Luciferase (LUC) Reporter Gene Constructs. 

The BMP-2-LUC constructs (Figure 7) were designed to contain variable 5' 
boundaries from BMP-2 5* -flanking sequences spanning the transcription start site (+1); 
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Each construct contained the 3* boundary at +1 14 9 in exon 1 (Figure 6). These constructs 
were individually transfected into primary cultures of fetal rat calvarial osteoblasts, ROS 
1 7/2.8 osteosarcoma cells, HeLa cells, and CV-1 cells by the calcium-phosphate 
precipitation technique and the promoter activity for each of these constructs was assayed 
24 hrs following transfection by measuring the luciferase enzyme activity for each 
individual cell lysate. The LUC (luciferase enzyme assay) technique is described below 
under (0- Plasmid psvPGal was co-transfected with each plasmid construct to normalize 
for the transfection efficiency in each sample. The experiments were repeated at least five 
times in independent fetal rat calvarial cultures, with each assay done in triplicate. The 
mean values from a representative experiment are shown in Figure 8. 

(d) Isolation of Primary Fetal Rat Calvarial Osteoblasts for Functional Studies 
of BMP-2 Gene Promoter. 

The cells used for transient transfection studies were isolated from 19 day-old fetal 
rat calvariae by sequential digestion with trypsin and collagenase, as described by Bellow et 
al, (1986) and Harris et al., (1994). In brief, the calvarial bone were surgically removed 
and cleaned by washing in a minimal essential media (aMEM) containing 10% V/V fetal 
calf serum (FCS) and antibiotics. The bones were minced with scissors and was transferred 
to 35 mm tissue culture dish containing 5 ml of sterile bacterial collagenase (0.1%) and 
trypsin (0.05%). This was then incubated at 37°C for 20 min. The cells released at this 
time were collected and immediately mixed with an equal volume of FCS to inactivate 
trypsin. This procedure was repeated 6 times to release cells at 20 min intervals. CeUs 
released from 3rd, 4th, 5th and 6th digestion (enriched for osteoblasts) were combined and 
the cells were collected by centrifugation at 400 g for 5 min. The cells were then plated in 
aMEM containing 1 0% FCS and antibiotics and were grown to confluency (2-3 days). At 
this stage the cells were plated for transfection in 60 mm tissue culture dishes at a cell 
density of 5 x 10 5 cells per dish. These primary osteoblst cultures are capable f mineralized 
bone in prolonged cultures (Bellows et al, 1986; Harris et al, 1994). HeLa, ROS 17/2.8, 
and CV-1 cells were purchased from the ATCC. 

(e) Transient Transfection Assay. 

For transient transfection assay, the primary osteoblast cells were plated at the 
above mentioned cell density 1 8-24 hrs prior to transfection. The transfection was carried 
out using a modified calcium-phosphate precipitation method (Graham & van der Eb 1973; 
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Frost & Williams 1978). The cells were incubated for 4 hrs. at 37°C with SOOfil of a 
calcium phosphate precipitate of plasmid DNA containing 1 Opg of reporter plasmid 
construct and 1 fig of pS VpGal (for normalization of transfection efficiency) in 0. 1 5M 
CaCb and Hepes buffered saline (21mMHepes, 13.5mMNaCl, SmMKCl, 0.7mM 
5 NajHPO^ 5.5mM dextrose, pH 7.05-7.1). Afler the 4 hr. incubation period of cells with 
precipitate, the cells were subjected to a 2 min treatment of 1 5% glycerol in otMEM, 
followed by addition of fresh ctMEM containing insulin, transferrin and selenium (ITS) 
(Upstate Biotechnology Lake Placid, NY). The cells were harvested 24 hrs post 
transfection. 

10 (f) Luciferase and P-galactosidase Assay. 

Cells lysates were prepared and luciferase enzyme assay was carried out using assay 
protocols and the assay kit from Promega (Madison, WI). Routinely 20fil of cell lysate 
was mixed with lOOfxl of luciferase assay reagent (270yM coenzyme A, 470fiM luciferin 
and 530pM ATP) and the luciferase activity was measured for 10 sec in a TURNER 
1 5 TD-20e luminometer. The values were normalized with respect to the p-galactosidase 

enzyme activity, obtained for each experimental sample * 

The P-galactosidase enzyme activity was measured in the cell lysate using a 96 well 
microther plate according to Rouet et al. (1992). 10-20^1 cell lysate was added to 90- 
80^1 p-galactosidase reaction buffer containing 88mM phosphate buffer, PH 7.3, 1 ImM 
. 20 KCL, ImM MgCl 2t 55mM P mercaptoethanol, 4.4mM chlorophenol red 

p-D-galactopyranoside (Boehringer-Mannheim Corp., Indianapolis, IN). The reaction 
mixture was incubated at 37°C for 30-60 min, depending on transfection efficiency, and the 
samples were read with an ELIS A plate reader at 600nm. 

(g) Plasmid Construction 

25 The luciferase basic plasmid (pGL basic) was the vector used for all constructs 

(purchased from Promega, Madison, WI). Different lengths of DNA fragments from the 
BraP-2 5 '-flanking region were cloned at the multiple cloning shes of this plasmid, which is 
upstream of the firefly luciferase cDNA. The.BMP-2 DNA fragments were isolated either 
by using available restriction enzyme sites (constructs -196/+1 14, -876/+1 14, -1995/+1 14, - 

30 2483/+1 14, and -2736/+1 14) or by polymerase chain reaction using specific oligonucleotide 
primers (constructs -23/+1 14, -123/+1 14 and +29/+1 14. 
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The minimal promoter activity for the BMP-2 gene was identified in the shortest 
construct containing 23bp upstream of the transcription start site (-23/+1 14). No luciferase 
activity was noted in the construct and did not include the transcription start site 
(+29/+1 14). Two other constructs containing increasing lengths of 5' sequences up to - 
196bp showed reproducible decreases in promoter activity in fetal rat calvarial osteoblasts 
and HeLa cells (Figure 8). The -8767+1 14 construct showed a 5-fold increase in activity in 
HeLa cells. The -1 995/+1 14, -2483/+1 14 and -2736/+1 14 constructs showed decreased 
promoter activity when compared to the -876/+1 14 construct only in HeLa cells (Figure 8). 

In the primary fetal rat calvarial osteoblasts, the 2.6kb construct (-2483/+1 14) 
demonstrated a 2-3-fold increase in luciferase activity over that of the -1995/+1 14 
construct (Figure 8). These results suggest that one or more positive response regions are 
present between -196 and -1995 and that the DNA sequence between -1995 and -2483bp 
was other positive regulatory elements that could modulate BMP-2 transcription. The 
largest 2.9kb construct (-2836/+1 14) repeatedly demonstrated a 20-50% decrease in 
promoter activity compared to the -2483/+1 14 construct, in these primary fetal rat calvarial 
osteoblasts (Figure 8). 

In ROS 17/2.8 osteosarcoma cells, the BMP-2 promoter activity was consistently 
higher than either the primary fetal rat calvarial osteoblasts or HeLa cells (Figure 8). All of 
the deletion constructs showed similar promoter activity in ROS 17/2.8 osteosrcoma cells. 
The transformed state in ROS 17/2.8 cells may be responsible for the marked expression of 
the BMP-2 gene. ROS 1 7/2. 8 cells represent a well differentiated osteosrcoma and they 
produce high levels of BMP-2 mRNA They form tumors in nude mice with bone-like 
material in the tumor (Majeska et al, 1 978; Majeska et al, 1 980). 

(h) Specificity of the BMP-2 Promoter. 

To analyze the activity of the BMP-2 promoter in cell types not expressing BMP-2 
mRNA, BMP-2 promoter constructs were transfected into CV-1 cells (monkey kidney 
cells). The BMP-2 promoter activity was found to be very low for all constructs. This 
suggests that this region of the BMP-2 promoter is functional only in cells such as primary 
fetal rat calvarial osteoblasts, HeLa and ROS 17/2.8 that express endogenous BMP-2 
mRNA (Anderson & Coulter 1968). CV-1 cells do not express BMP-2 mRNA The 
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BMP-2 promoter is likely active in other cell types that express BMP-2, such as prostate 
cells and chondrocytes, although regulation of transcription may be different in these cells. 

Example 3 : USE OF PLASMID CONSTRUCTS CONTAINING BMP 
PROMOTERS WITH REPORTER GENES TO IDENTIFY 
OSTEOGENIC AGENTS 

Plasmid constructs containing BMP promoters with reporter genes have been 
transfected into osteoblastic cells. The cells which have been utilized include primary 
cultures of fetal rat calvarial osteoblasts, cell lines obtained as gifts or commercially 
(MC3T3-E12 cells, MG-63 cells, U20S cells, UMR106 cells, ROS 17/2.8 cells, Sa)S2 
cells, and the like as provided in the catalog from the ATCC) and bone and cartilage cell 
lines established from transgenic mice. The bone cells are transfected transiently or stably 
with the plasmid constructs, exposed to the chemical compound, agent or factor to be 
tested for 48 hours, and then hiciferase or CAT activity is measure in the cell lysates. 

Regulation of expression of the growth fector is assessed by culturing bone cells in 
aMEM medium with 10% fetal calf serum and 1% peiiirinin/streptomycin and 1% 
glutamine. The ceDs are placed in microtiter plates at a cell density of 5x1 0 3 cells 
/100ul/well. The cells are allowed to adhere and then incubated at 37°C at 5% C0 2 for 24 
hours and then the media is removed and replaced with 50|il aMEM and 4% fetal calf 
serum, 50ul aliquots containing the compound or factor to be tested in 0.1% BSA solution 
is added to each well. The final volume is lOOul and the final serum concentration is 2% 
fetal calf serum. Recombinant rat BMP-2 expressed in Chinese hamster ovarian cells is 
used as a positive control. 

The treated cells are incubated at 37°C at 5% COj for 48 hours. The media is then 
removed and the cells are rinsed 3 times with phosphate buffered saline (PBS). Excess 
PBS is removed from the wells and lOOul of cell culture lysing reagent (Promega #E153A) 
is added to each well. After 10 minutes, lOul of the cell lysate is added to a 96-well white 
luminometric plate (Dynatech Labs #07100) containing lOOul hiciferase assay buffer with 
substrate (Promega #E1 52A). The hiciferase activity is read using a Dynatech ML2250 
automated 96-well luminoraeter. The data is expressed as either picbgraras of luciferase 
activity per well or picograms of luciferase per ug protein. 



SUBSTITUTE SHEET (RULE 26) 



WO 96/38590 



-19- 



PCT/DS96/08197 



Example 4: DEMONSTRATION THAT BONE CELLS 

TRANSFECTED WITH BMP PROMOTERS CAN 
BE USED TO SCREEN FOR OSTEOGENIC AGENTS 

To demonstrate that the present invention is useful in evaluating potential 
osteogenic agents, a random array of chemical compounds from a chemical library obtained 
commercially was screened. It was found that approximately 1 in 100 such compounds 
screened produces a positive response in the present assay system compared with the 
positive control, recombinant BMP-2, which is known to enhance BMP-2 transcription. 
Compounds identified from the random library were subjected to detailed dose-response 
curves, to demonstrate that they enhance BMP messenger RNA expression, and that they 
enhance other biological effects m vitro, such as expression of structural proteins including 
osteocalcin, osteopontin and alkaline phosphase, and enhance bone nodule formation in 
prolonged primary cultures of calvarial rodent osteoblasts. 

Compounds identified in this way can be tested for their capacity to stimulate bone 
formation in vitro in mice. To demonstrate this, the compound can be injected locally into 
subcutaneous tissue over the calvarium of normal mice and then the bone changes are 
followed histologically. It has been found that certain compounds identified by the present- 
invention stimulate the formation of new bone in this in vivo assay system. 

The effects of compounds are tested in ICR Swiss mice, aged 4-6 weeks and 
weighing 13-26g. The compound at 20mg/kg or vehicle along (lOOul of 5% DMSO and 
phosphate-buffered 0.9% saline) are injected three times daily for 7 days. The injections 
are given into the subcutaneous tissues overlying the right side of the calvaria of five mice 
in each treatment group in each experiment. 

Mice are killed by either inhalation on day 14, i.e. 7 days after the last injection of 
compound. After fixation in 10% phosphate-buffered formalin, the calvariae are examined. 
The occipital bone is removed by cutting immediately behind and parallel to the lambdoid 
suture, and the frontal bone is removed by cutting anterior to the coronal suture using a 
scalpel blade. The bones are then bisected through the coronal plane and the 3- to 4mm 
strips of bone are decalcified in 14% EDTA, dehydrated in graded alcohols, and embedded 
in paraffin. Four 3um thick nonconsecutive step sections are cut from each specimen and 
stained using hematoxylin and eosin. 

Two representative sections from the posterior calvarial strips are used. 
Histological measurements are carried out using a digitizing tablet and the Osteomeasure 
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image analysis system (Osteometries Inc., Atlanta, GA) on the injected and noninjected 
sides of the calvariae in a standard length of bone between the sagittal suture and the 
muscle insertion of the lateral border of each bone. Measurements consist of (1) Total 
bone area (i.e., bone and marrow between inner and outer periosteal surfaces); (2) Area of 
5 new woven bone formed on the outer calvarial surface; (3) The extent of osteoblast lined 
surface on the outer calvarial surface; (4) The area of the outer periosteum; and (5) The 
length of calvarial surface. From these measurements, the mean width of new bone and 
periosteum and the percentage of surface lined by osteoblasts on the outer calvarial surface, 
can be determined. 

10 By reference to the above disclosure and examples, it is seen that the present 

invention provides a new cell-based assay for identifying and evaluating compounds which 
stimulate the growth of bone. Also provided in accordance with the present invention are 
promoter regions of bone morphogenetic protein genes, and a system for identifying 
osteogenic agents utilizing such promoters operatively linked to reporter genes in 

15 expression vectors. 

The present invention provides the means to specifically identify osteogenic agents 
which stimulate bone cells to produce bone growth factors in the bone morphogenetic 
protein femily. These osteogenic agents are shown to be useful to increase the activity of 
the promoters of genes of members of the BMP family and other bone growth factors 

20 normally produced by bone cells. 



Examples: RESEQUENCES OF THE BMP-2 5TLANKING REGION 

The BMP-2 5' flanking region described in Example 2 was resequenced. The 
nucleotide sequence of the 5' flanking region of the mouse BMP-2 gene is provided in 
25 Figure 1 1 . The sequence information in Figure 1 1 corrects sequencing errors that are 
present in Figures 5 and 9. The nucleotide sequence of Figure 1 1 replaces bases -2736 to 
+1 19 provided in Figure 5 and bases 1 to 2855 provided in Figure 9. The non-nucleotide 
sequence information provided in Figure 5 is applicable to the corresponding bases in 
Figure 1 1 where such bases are present. 
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AI1 publications and patent applications cited in this specification are herein 
incorporated by reference as if each individual publication or patent application are [is] 
specifically and individually indicated to be incorporated by reference. 

Although the foregoing invention has been described in some detail by way of 
5 illustration and example for purposes of clarity and understanding, it will be apparent to 
those of ordinary skill in the art in light of the teaching of this invention that certain changes 
and modifications may be made thereto without departing from the spirit or scope of the 
appended claims. 
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<2) INFORMATION FOR SEQ ID NO:l: 
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(A) LENGTH: 2310 base pairs 

(B) TYPE: nucleic acid 
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(ix) FEATURE: 

(A) NAME/KEY: CDS 

(B) LOCATION: 768.. 1991 



SUBSTITUTE SHEET (RULE 26) 



WO 96*8590 



-25- 



PCT/US96/08I97 



Cxi) SEQUENCE DESCRIPTION : SEQ ID NO:l: 

GGGAGGAAGG GAAGAAAGAG AGGGAGGGAA AAGAGAAGGA AGGAGTAGAT GTGAGAGGGT 60 

GGTGCTGAGG GTGGGAAGGC AAGAGCGCGA GGCCTGGCCC GGAAGCTAGG TGAGTTCGGC 120 

ATCCOAGCTG AGAGACCCCA GCCTAAGACG CCTGCGCTGC AACCCAGCCT GAGTATCTGG 180 

TCTCCGTCCC TGATGGGATT CTCGTCTAAA CCGTCTTGGA GCCtGCAGCG ATCCAGTCTC 240 

TGGCCCTCGA CCAGGTTCAT TGCAGCTTTC TAGAGGTCCC CAGAAGCAGC TGCTGGCGAG 300 

CCCGCTTCTG CAGGAACCAA TGGTGAGCTC GAGTGCAGGC CGAAAGCTGT TCTCGGGTTT 360 

GTAGACGCTT GGGATCGCGC TTGGGGTCTC CTTTCGTGCC GGGTAGGAGT TGTAAAGCCT 420 

TTGCAACTCT GAGATCGTAA AAAAAATGTG ATGCGCTCTT TCTTTGGCGA CGCCTGTTTT 480 

GGAATCTGTC CGGAGTTAGA AGCTCAGACG TCCACCCCCC ACCCCCCGCC CACCCCCTCT 540 

GCCTTGAATG GCACCGCCGA CCGGTTTCTG AAGGATCTGC TTGGCTGGAG CGGACGCTGA 600 

GGTTGGCAGA CACGGTGTGG ATTTTAGGAG CCATTCCGTA GTGCCATTCG GAGCGACGCA 660 

CTGCCGCAGC TTCTCTGAGC CTTTCCAGCA AGTTTGTTCA AGATTGGCTC CCAAGAATCA 720 



TGGACTGTTA TTATGCCTTG TTTTCTGTCA GTGAGTCCAG AGACACC ATG ATT CCT 

Met lie Pro 
l 

GGT AAC CGA ATG CTG ATG GTC GTT TTA TTA TGC CAA GTC CTG CTA GGA 
Gly Asn Arg Met Leu Met Val Val Leu Leu Cys Gin Val Leu Leu Gly 
S io 15 

GGC GCG AGC CAT GCT ACT TTG ATA CCT GAG ACC GGG AAG AAA AAA GTC 
Gly Ala ser His Ala Ser Leu He Pro Glu Thr Gly Lys Lys Lya Val 
20 » 30 as 

GCC GAG ATT CAG GGC CAC GCG GGA GGA CGC CGC TCA GGG CAG AGC CAT 
Ala Glu He Gin Gly Hie Ala Gly Gly Arg Arg Ser Gly Gin Ser His 

40 45 so 

GAG CTC CTG CGG GAC TTC GAG GCG ACA CTT CTA CAG ATG TTT GGG CTG 
Glu Leu Leu Arg Asp Phe Glu Ala Thr Leu Leu Gin Met Phe Gly Leu 
55 60 SB 

CGC CGC CGT CCG CAG CCT AGC AAG AGC GCC GTC ATT CCG GAT TAC ATG 
Arg Arg Arg Pro Gin Pro Ser Lys Ser Ala Val He Pro Asp Tyr Met 
70 75 80 



776 



824 



872 



920 



968 



1016 



AGO GAT CTT TAC CGG CTC CAG TCT GGG GAG GAG GAG GAG GAA GAG CAG 1064 
Arg Asp Leu Tyr Arg Leu Gin Ser Gly Glu Glu Glu Glu Glu Glu Gin 
85 so 95 
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AGC CAG GGA ACC GGG CTT GAG TAC CCG GAG CGT CCC GCC AGC CGA GCC 1112 
Ser Gin Gly Thr Gly Leu Glu Tyr Pro Glu Arg Pro Ala Ser Arg Ala 
100 105 lio lis 

AAC ACT GTG AGG AGT TTC CAT CAC GAA GAA CAT CTG GAG AAC ATC CCA 1160 
Asn Thr Val Arg Ser Phe Hi8 His Glu Glu His Leu Glu Asn lie Pro 

120 125 130 

GGG ACC AGT GAG AGC TCT GCT TTT CGT TTC CTC TTC AAC CTC AGC AGC 1208 
Gly Thr Ser Glu Ser Ser Ala Phe Arg Phe Leu Phe Asn Leu Ser Ser 
135 140 145 

ATC CCA GAA AAT GAG GTG ATC TCC TCG GCA GAG CTC CGG CTC TTT CGG 1256 
He Pro Glu Asn Glu Val He Ser Ser Ala Glu Leu Arg Leu Phe Arg 
150 155 160 

GAG CAG GTG GAC CAG GGC CCT GAC TGG GAA CAG GGC TTC CAC CGT ATA 1304 
Glu Gin Val Asp Gin Gly Pro Asp Trp Glu Gin Gly Phe His Arg He 
165 170 175 

AAC ATT TAT GAG GTT ATG AAG CCC CCA GCA GAA ATG GTT CCT GGA CAC 1352 
Asn He Tyr Glu Val Met Lys Pro Pro Ala Glu Met Val Pro Gly His 
180 185 190 195 

CTC ATC ACA CGA CTA CTG GAC ACC AGA CTA GTC CAT CAC AAT GTG ACA 1400 
Leu He Thr Arg Leu Leu Asp Thr Arg Leu Val His His Asn Val Thr 

200 205 210 

CGG TGG GAA ACT TTC GAT GTG AGC CCT GCA GTC CTT CGC TGG ACC CGG 1448 
Arg Trp Glu Thr Phe Asp Val Ser Pro Ala Val Leu Arg Trp Thr Arg 
215 220 225 

GAA AAG CAA CCC AAT TAT GGG CTG GCC ATT GAG GTG ACT CAC CTC CAC 1496 
Glu Lys Gin Pro Asn Tyr Gly Leu Ala He Glu Val Thr His Leu His 
230 235 240 

CAG ACA CGG ACC CAC CAG GGC CAG CAT GTC AGA ATC AGC CGA TCG TTA 1544 
Gin Thr Arg Thr His Gin Gly Gin His Val Arg He Ser Arg Ser Leu 
245 250 255 

CCT CAA GGG AGT GGA GAT TGG GCC CAA CTC CGC CCC CTC CTG GTC ACT 1592 
Pro Gin Gly Ser Gly Asp Trp Ala Gin Leu Arg Pro Leu Leu Val Thr 
260 265 270 275 

TTT GGC CAT GAT GGC CGG GGC CAT ACC TTG ACC CGC AGG AGG GCC AAA 1640 
Phe Gly His Asp Gly Arg Gly His Thr Leu Thr Arg Arg Arg Ala Lys 

2B0 285 290 

CGT AGT CCC AAG CAT CAC CCA CAG CGG TCC AGG AAG AAG AAT AAG AAC 1688 
Arg Ser Pro Lys His His Pro Gin Arg Ser Arg Lys Lys Asn Lys Asn 
295 300 305 

TGC CGT CGC CAT TCA CTA TAC GTG GAC TTC AGT GAC GTG GGC TGG AAT 1736 
Cys Arg Arg His Ser Leu Tyr Val Asp Phe Ser Asp Val Gly Trp Aan 
310 315 320 
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GAT TGG ATT GTG GCC CCA CCC GGC TAC CAG GCC TTC TAC TGC CAT GGG 1784 
Asp Trp lie Val Ala Pro Pro Gly Tyr Gin Ala Phe Tyr Cys His Gly 
325 330 335 

GAC TGT CCC TTT CCA CTG GCT GAT CAC CTC AAC TCA ACC AAC CAT GCC 1832 
Asp Cys Pro Phe Pro Leu Ala Asp His Leu Asn Ser Thr Asn His Ala 
340 345 350 355 

ATT GTG CAG ACC CTA GTC AAC TCT GTT AAT TCT AGT ATC CCT AAG GCC 1880 
lie Val Gin Thr Leu Val Asn Ser Val Asn Ser Ser He Pro Lys Ala 

360 365 370 

TGT TGT GTC CCC ACT GAA CTG AGT GCC ATT TCC ATG TTG TAC CTG GAT 1328 
Cys Cys Val Pro Thr Glu Leu Ser Ala He Ser Met Leu Tyr Leu Asp 
375 380 385 

GAG TAT GAC AAG GTG GTG TTG AAA AAT TAT CAG GAG ATG GTG GTA GAG 1976 
Glu Tyr Asp Lys Val Val Leu Lys Asn Tyr Gin Glu Met Val Val Glu 
390 395 400 

GGG TGT GGA TGC CGC TGAOATCAGA CAGTCCGGAG GGCGOACACA CACACACACA 2031 
Gly Cys Gly Cys Arg 
405 

CACACACACA CACACACACA CACACACACA CGTTCCCATT CAACCACCTA CACATACCAC 2091 

ACAAACTGCT TCCCTATAGC TGGACTTTTA TCTTAAAAAA AAAAAAAAGA AAGAAAGAAA 2151 

GAAAGAAAGA AAAAAAATGA AAGACAGAAA AGAAAAAAAA AACCCTAAAC AACTCACCTT 2211 

GACCTTATTT ATGACTTTAC GTGCAAATGT TTTGACCATA TTGATCATAT TTTGACAAAT 2271 

ATATTTATAA AACTACATAT TAAAAGAAAA TAAAATGAG 2 310 



(2) INFORMATION FOR SEQ ID NO: 2: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 408 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 

(Xi) SEQUENCE DESCRIPTION: SEQ ID NO:2: 

Met He Pro Gly Asn Arg Met Leu Met Val Val Leu Leu Cys Gin Val 
1 5 xo 15 

Leu Leu Gly Gly Ala Ser His Ala Ser Leu He Pro Glu Thr Gly Lys 
20 25 . 30 

Lys Lys Val Ala Glu He Gin Gly His Ala Gly Gly Arg Arg Ser Gly 
35 40 45 

Gin Ser His Glu Leu Leu Arg Asp Phe Glu Ala Thr Leu Leu Gin Met 
SO 55 so 
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Phe Gly Leu Arg Arg Arg Pro Gin Pro Ser Lys Ser Ala Val lie Pro 
65 70 75 80 

Asp Tyr Met Arg Asp Leu Tyr Arg Leu Gin Ser Gly Glu Glu Glu Glu 

85 90 95 

Glu Glu Gin Ser Gin Gly Thr Gly Leu Glu Tyr Pro Glu Arg Pro Ala 
100 105 no 

Ser Arg Ala Asn Thr Val Arg Ser Phe His His Glu Glu His Leu Glu 
115 120 125 

Asn lie Pro Gly Thr Ser Glu Ser Ser Ala Phe Arg Phe Leu Phe Asn 
130 135 140 

Leu Ser Ser lie Pro Glu Asn Glu Val lie Ser Ser Ala Glu Leu Arg 
145 150 155 160 

Leu Phe Arg Glu Gin Val Asp Gin Gly Pro Asp Trp Glu Gin Gly Phe 

165 170 175 

His Arg lie Asn He Tyr Glu Val Met Lys Pro Pro Ala Glu Met Val 
180 IBS 190 

Pro Gly His Leu He Thr Arg Leu Leu Asp Thr Arg Leu Val His His 
195 200 205 

Asn Val Thr Arg Trp Glu Thr Phe Asp Val Ser Pro Ala Val Leu Arg 
210 215 220 

Trp Thr Arg Glu Lys Gin Pro Asn Tyr Gly Leu Ala He Glu Val Thr 
225 230 235 240 

His Leu His Gin Thr Arg Thr His Gin Gly Gin His' Val Arg He Ser 

245 250 255 

Arg Ser Leu Pro Gin Gly Ser Gly Asp Trp Ala Gin Leu Arg Pro Leu 
260 265 270 

Leu Val Thr Phe Gly His Asp Gly Arg Gly His Thr Leu Thr Arg Arg 
275 280 285 

Arg Ala Lys Arg Ser Pro Lys His His Pro Gin Arg Ser Arg Lys Lys 
290 295 300 

Asn Lys Asn Cys Arg Arg His Ser Leu Tyr Val Asp Phe Ser Asp Val 
305 310 315 320 

Gly Trp Asn Asp Trp He Val Ala Pro Pro Gly Tyr Gin Ala Phe Tyr 

325 330 335 

Cys His Gly Asp Cys Pro Phe Pro Leu Ala Asp His Leu Asn Ser Thr 
340 345 : 350 

Asn His Ala He Val Gin Thr Leu Val Asn Ser Val Asn Ser Ser He 
355 360 365 
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Pro Lys Ala Cys Cys Val Pro ThrGlu Leu Ser Ala He Ser Met Leu 
370 375 380 

Tyr Leu Asp Glu Tyr Asp Lys Val Val Leu Lys Asn Tyr Gin Glu Met 
385 390 395 400 

Val Val Glu Gly Cys Gly Cys Arg 

405 

(2 ) INFORMATION FOR SEQ ID NO : 3 : 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2688 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 



<xi) SEQUENCE DESCRIPTION: SEQ ID NO : 3 : 

GAATTCGCTA GGTAGACCAG GCTGGCCCAG AACACCTAGA GATCATCTGG CTGCCTCTGT 60 

CTCTTGAGTT CTGGGGCTAA AGCATGCACC ACTCTACCTG GCTAGTTTGT ATCCATCTAA 120 

ATTGGGGAAG AAAGAAGTAC AGCTGTCCCC AGAGATAACA GCTGGGTTTT CCCATCAAAC 180 

ACCTAGAAAT CCATTTTAGA TTCTAAATAG GGTTTGTCAG GTAGCTTAAT TAGAACTTTC 240 

AGACTGGGTT TCACAGACTG GTTGGGCCAA AGGTCACTTT ATTGTCTGGG TTTCAGCAAA 300 

ATGAGACAAT AGCTGTTATT CAAACAACAT TTGGGTAAGG AAGAAAAATG AACAAACACC 360 

ACTCTCCCTC CCCCCGCTCC GTGCCTCCAA ATCCATTAAA GGCAAAGCTG CACCCCTAAG 420 

GACAACGAAT CGCTGCTGTT TGTGAGTTTA AATATTAAGG AACACATTGT GTTAATGATT 480 

GGAGCAGCAG TGATTGATGT AGTGGCATTG GTGAGCACTG AATCCGTCCT TCAACCTGCT 540 

ATGGGAGCAC AGAGCCTGAT GCCCCAGGAG TAATGTAATA GAGTAATGTA ATGTAATGGA 600 

GTTTTAATTT TGTGTTGTTG TTTTAAATAA TTAATTGTAA TTTTGGCTGT GTTAGAAGCT 660 

GTGGGTACGT TTCTCAGTCA TCTTTTCGGT CTGGTGTTAT TGCCATACCT TGATTAATCG 720 

GAGATTAAAA GAGAAGGTGT ACTTAGAAAC GATTTCAAAT GAAAGAAGGT ATGTTTCCAA 780 

TGTGACTTCA CTAAAGTGAC AGTGACGCAG GGAATCAATC GTCTTCTAAT AGAAAGGGCT 840 

CATGGAGACC TGAGCTGAAT CTTTCTGTTC TGGATGAGAG AGGTGGTACC CATTGGAATG 900 

AAAGGACTTA GTCAGGGGCA ATACAGTGTG CTCCAAGGCT GGGGATGGTC AGGATGTTGT 960 

GCTCAGCCTC TAACACTCCT TCCAACCTGA CATTCCTTCT CACCCTTTGT CTCTGGCCAG 1020 

TAGAATACAG GAACTCGTTC CTGTTTTTTT TTTTTTAAAT TCTGAAGGTG TGTAAGTACA 1080 
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AAGCTCAGAT GAGCGGCCCT AGGTCAAGAC TGCTTTGTGG TGACAAGGGA GTATAACACC 114 0 

CACCCCAGAA ACCAAGAACC GGAAATTGCT ATCTTCCAGC CCTTTGAGAG CTACCTGAAG 1200 

CTCTGGGCTG CTGGCCTCAC CCCTTCCCTG CAGCTTTCCC TTTAGCAGAG GCTGTGATTT 1260 

CCTTCAGCGC TTGGGCAAAT ACTCTTAGCC TGGCTCACCT TCCCCATCCT CGTTTGTAAA 1320 

AACAAAGATG AAGCTGATAG TTCCTTCCCA GCTCCATCAG AGGCAGGGTG TGAAATTAGC 1380 

TCCTGTTTGG GAAGGTTTAA AAGCCGGCCA CATTCCACCT CCCAGCTAGC ATGATTACCA 1440 

ACTCTTGTTT CTTACTGTTG TTATGAAAGA CTCAATTCCT CATCTCCCTT TCCCTTCTTT 1500 

TAAAAAGGGG CCAAAGGGCA CTTTGTTTTT TTCTCTACAT GGCCTAAAAG GCACTGTGTT 1560 

ACCTTCCTGG AAGGTCCCAA ACAAACAAAC AAACAAACAA AATAACCATC TGGCAGTTAA 1620 

GAAGGCTTCA GAGATATAAA TAGGATTTTC TAATTGTCTT ACAAGGCCTA GGCTGTTTGC 1680 

CTGCCAAGTG CCTGCAAACT ACCTCTGTGC ACTTGAAATG TTAGACCTGG GGGATCGATG 1740 

GAGGGCACCC AGTTTAAGGG GGGTTGGTGC AATTCTCAAA TGTCCACAAG AAACATCTCA 1800 

CAAAAACTTT TTTGGGGGGA AAGTCACCTC CTAATAGTTG AAGAGGTATC TCCTTCGGGC 1860 

ACACAGCCCT GCTCACAGCC TGTTTCAACG TTTGGGAATC CTTTAACAGT TTACGGAAGG 1920 

CCACCCTTTA AACCAATCCA ACAGCTCCCT TCTCCATAAC CTGATTTTAG AGGTGTTTCA 1980 

TTATCTCTAA TTACTCGGGG TAAATGGTGA TTACTCAGTG TTTTAATCAT CAGTTTGGGC 2040 

AGCAGTTATT CTAAACTCAG GGAAGCCCAG ACTCCCATGG GTATTTTTGG AAGGTACAGA 2100 

GACTAGTTGG TGCATGCTTT CTAGTACCTC TTGCATGTGG TCCCCAGGTG AGCCCCGGCT 2160 

GCTTCCCGAG CTGGAGGCAT CGGTCCCAGC CAAGGTGGCA ACTGAGGGCT GGGGAGCTGT 2220 

GCAATCTTCC GGACCCGGCC TTGCCAGGCG AGGCGAGGCC CCGTGGCTGG ATGGGAGOAT 2280 

GTGGGCGGGG CTCCCCATCC CAGAAGGGGA GGCGATTAAG GGAGGAGGGA AGAAGGGAGG 2340 

GGCCGCTGGG GGGAAAGACT GGGGAGGAAG GGAAGAAAGA GAGGGAGGGA AAAGAGAAGG 2400 

AAGGAGTAGA TGTGAGAGGG TGGTGCTGAG GGTGGGAAGG CAAGAGCGCG AGGCCTGGCC 2460 

CGGAAGCTAG GTGAGTTCGG CATCCGAGCT GAGAGACCCC AGCCTAAGAC GCCTGCGCTG 2520 

CAACCCAGCC TGAGTATCTG GTCTCCGTCC CTGATGGGAT TCTCGTCTAA ACCGTCTTGG 2580 

AGCCTGCAGC GATCCAGTCT CTGGCCCTCG ACCAGGTTCA TTGCAGCTTT CTAGAGGTCC 2640 

CCAGAAGCAG CTGCTGGCGA GCCCGCTTCT GCAGGAACCA ATGGTGAG 2688 
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(2) INFORMATION FOR SEQ ID NO:4: 

CD SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2875 base pairs 

(B) TYPE : nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:4: 
GAATTCATTT AAGCTGGATT CACTTCTAGG TCCCATGCGT TTACACTCAT TTCCACCACA 
AGAGGGCAGC CATCTCTAAA AAAACAACAG TCGAGTGCTC TTCAGAGAAA TTGGGCCAAA 
CTTGAGGAAA GTTCCTGGGA AAGGCTTTTT AGCAGCACCT CTCTGGGCTA CAAAAAAGAA 



GCCAGCAGGC ACCACCAAGG 
TTOATTACTA AGGATATCCT 
AGCTGCAAGG CATTGTTGAT 
GGTCCAACAG CTGTCAGCTT 
TATAGGTTCG GAGTTTCTTG 
AACTTCTCAA TTAAACTTGA 
CTTACACACT TACACGTCTG 
TTCAGAGTGA CAACTTCTGC 
GCTGGATCTA TCCCTTCCTC 
CCTTATTTGA CCTCTACAGC 
TAATCCGATT TAGGTGAACG 
CACQTGGGTA AAAAAATCAT 
AAACTGGAAA GATCTGGTTC 
CAGCCCATAG TTTTGGGGGT 
GTAGGGAGAA ATGGAAAGAT 
AGCTGAGGAA GAAAACTGGC 
AGAGAGGACC AGGCAGAAAA 
TGGAGTAGGT GGGTGTGGAA 
TAAAGAGAAT TTCTATTAAC 



TGGAGTAACT 
AAACGGCCAA 
GTCATCACCA 
TCTCTTCCTC 
CTTTGCTCCT 
TAGGGAAGGA 
AGTGGAGTGT 
AACACGTTTT 
TCCTTTAATT 
TCTAGAAACA 
AACCTAGAGT 
TAAAGCCCCT 
ACAACGTAAC 
CCTGTGGGTA 
TCAAAAAAGA 
TTGGCCACAG 
TTCAAAGGTC 
GGGAAOATAA 
TCTCATTGTC 



GTCCAGAGGC ATCCATTTTA CCTCAGAGAC 
ACTCTCTCTT CTGGTGTTCC AGAGGCCCAA 
AAGGTTTCAT TTTCATCTTT TCTTGGGGTT 
ATTAAAGGCA ACTTTCTCAT TTAAATCTCA 
TCCGCCTCCG CGATGACAGA AGCAATGGTT 
AATGGCTTCA GAGGCGATCA GCCCTTTTGA 
TTTATTGCCG CCTTGTTTGG TGTCTCATGA 
AAAAAGGAAT ACAGTAGCTG ATCGCAAATT 
TCCCTTGTAG ACAGCCTTCC TTCAAAAATA 
GCCAGGGCCT AATTTCCCTC TGTGGGTTGC 
TATTTTAGCT AAAAGACTGA AAAGCTAGCA 
GCTTCTGGTC TTTCTCGGTC TTTGCTTTGC 
GTTATCACTC TGGTCTTCTA CAGGAATGCT 
GCCAGTGGTG GTACTATAAG GCTCCTGAAT 
ATCCTGGCTC AGCAGCTTGG GGACATTTCC 
CCAGAGCCTT CTGCTGGAGA CCCAGTGGAG 
TCAAACCGGA ATTGTCTTGT TACCTCACTC 
ATATCACAAG TATCGAAGTG ATCGCTTCTA 
CCTCACATGG ACACACACAC ACACACACAC 



60 
120 
180 
240 
300 
360 
420 
480 
540 
600 
660 
720 
780 
840 
900 
960 
1020 
1080 
1140 
1200 
1260 
1320 
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ACACACACAC ACACATCACT AGAAGOGATG TCACTTTACA AGTGTGTATC TATGTTCAGA 1380 

AACCTGTACC CGTATTTTTA TAATTTACAT AAATAAATAC ATATAAAATA TATGCATCTT 1440 

TTTATTAGAT TCATTTATTT GAATATAAAT GTATGAATAT TTATAAAATG TAATAATGCA 1500 

CTCAGATGTG TATCGGCTAT TTCTCGACAT TTTCTTCTCA CCATTCAAAA CAGAAGCGTT 1560 

TGCTCACATT TTTGCCAAAA TGTCTAATAA CTTGTAAGTT CTGTTCTTCT TTTTAATGTG 1620 

CTCTTACCTA AAAACTTCAA ACTCAAGTTG ATATTGGCCC AATOAGGGAA CTCAGAGGCC 1680 

AGTGGACTCT GOATTTGCCC TAGTCTCCCG CAGCTGTGGG CGCGGATCCA GGTCCCGGGG 1740 

GTCGGCTTCA CACTCATCCG GGACGCGACC CCTTAGCGGC CGCGCGCTCG CCCCGCCCCG 1800 

CTCCACCGCG GCCCCGTACG CGCCGTCCAC ACCCCTGCGC GCCCGTCCCG CCCGCCCGGG I860 

GGATCCCGGC CGTGCTGCCT CCGAGGGGGA GGTGTTCGCC ACGGCCGGGA GGGAGCCGGC 1920 

AGGCGGCGTC TCCTTTAAAA GCCGCGAGCG CGCGCCAGCG CGGCTCGTCG CCGCCGGAGT 1980 

CCTCGCCCTG CCGCGCAOAG CCCTGCTCGC ACTGCGCCCG CCGCGTGOGC TTCCCACAGC 2040 

CCGCCCGGGA TTGGCAGCCC CGGACGTAGC CTCCCCAGGC GACACCAGGC ACCGGGACGC 2100 

CCTCCCGGCG AAAGACGCGA GGGTCACCCG CGGCTTCGAG GGACTGGCAC GACACGGGTT 2160 

GGAACTCCAG ACTGTGCGCG CCTGGCGCTG TGGCCTCGGC TGTCCGGGAG AAGCTAGAGT 2220 

CGCGGACCGA CGCTAAGAAC CGGGAGTCCG GAGCACAGTC TTACCCTCAA TGCGGGGCCA 2280 

CTCTGACCCA GGAGTGAGCG CCCAAGGCGA TCGGGCGGAA GAGTGAGTGG ACCCCAGGCT 2340 

GCCACAAAAG ACACTTGGCC CGAGGGCTCG GAGCGCGAGG TCACCCGGTT TGGCAACCCG 2400 

AOACGCGCGG CTGGACTGTC TCGAGAATGA GCCCCAGGAC GCCGGGGCGC CGCAGCCGTG 2460 

CGGGCTCTGC TGGCGAGCGC TGATGGGGGT GCGCCAGAGT CAGGCTGAGG GAGTGCAGAG 2520 

TGCGGCCCGC CCGCCACCCA AGATCTTCGC TGCGCCCTTG CCCGGACACG GCATCGCCCA 2580 

CGATGGCTGC CCCGAGCCAT GGGTCGCGGC CCACGTAACG CAGAACGTCC GTCCTCCGCC 2640 

CGGCGAGTCC CGGAGCCAGC CCCGCGCCCC GCCAGCGCTG GTCCCTGAGG CCGACGACAG 2700 

CAGCAGCCTT GCCTCAGCCT TCCCTTCCGT CCCGGCCCCG CACTCCTCCC CCTGCTCGAG 2760 

GCTGTGTGTC AGCACTTGGC TGGAGACTTC TTGAACTTGC CGGGAGAGTG ACTTGGGCTC 2820 

CCCACTTCGC GCCGGTGTCC TCGCCCGGCG GATCCAGTCT TGCCGCCTCC AGCCC 2875 
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(2) INFORMATION FOR SEQ ID NO:5: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 18 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 



<xi) SEQUENCE DESCRIPTION: SEQ ID NO: 5 

CCCGGCAAGT TCAAGAAG 

(2) INFORMATION FOR SEQ ID NO: 6: 

(i) SEQUENCE CHARACTERISTICS: 

(A} LENGTH: 15144 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 6: 

GAATTCATTT AAGCTGGATT CACTTCTAGG TCCCATGCGT TTACACTCAT TTCCACCACA 60 

AGAGGGCAGC CATCTCTAAA AAAACAACAG TCGAGTGCTC TTCAGAGAAA TTGGGCCAAA 120 

CTTGAGGAAA GTTCCTGGGA AAGGCTTTTT AGCAGCACCT CTCTGGGCTA CAAAAAAGAA 180 

GCCAGCAGGC ACCACCAAGG TGGAGTAACT GTCCAGAGGC ATCCATTTTA CCTCAGAGAC 240 

TTGATTACTA AGGATATCCT AAACGGCCAA ACTCTCTCTT CTGGTGTTCC AGAGGCCCAA 300 

AGCTGCAAGG CATTGTTGAT GTCATCACCA AAGGTTTCAT TTTCATCTTT TCTTGGGGTT 360 

GGTCCAACAG CTGTCAGCTT TCTCTTCCTC ATTAAAGGCA ACTTTCTCAT TTAAATCTCA 420 

TATAGGTTCG GAGTTTCTTG CTTTGCTCCT TCCGCCTCCG CGATGACAGA AGCAATGGTT 480 

AACTTCTCAA TTAAACTTGA TAGGGAAGGA AATGGCTTCA GAGGCGATCA GCCCTTTTGA 540 

CTTACACACT TACACGTCTG AGTGGAGTGT TTTATTGCCG CCTTGTTTGG TGTCTCATGA 600 

TTCAGAGTGA CAACTTCTGC AACACGTTTT AAAAAGGAAT ACAGTAGCTG ATCGCAAATT 660 

GCTGGATCTA TCCCTTCCTC TCCTTTAATT TCCCTTGTAG ACAGCCTTCC TTCAAAAATA 720 

CCTTATTTGA CCTCTACAGC TCTAGAAACA GCCAGGGCCT AATTTCCCTC TGTGGGTTGC 780 

TAATCCGATT TAGGTGAACG AACCTAGAGT TATTTTAGCT AAAAGACTGA AAAGCTAGCA 840 
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CACGTGGGTA 


AAAAAATCAT 


TAAAGCCCCT GCTTCTGGTC TTTCTCGGTC 


TTTGCTTTGC 


900 


AAACTGGAAA 


GATCTGGTTC 


ACAACGTAAC GTTATCACTC TGGTCTTCTA 


CAGGAATGCT 


960 


CAGCCCATAG 


TTTTGGGGGT 


CCTGTGGGTA GCCAGTGGTG GTACTATAAG 


GCTCCTGAAT 


1020 


GTAGGGAGAA 


ATGGAAAGAT 


TCAAAAAAGA ATCCTGGCTC AGCAGCTTGG 


GGACATTTCC 


1080 


AGCTGAGGAA 


GAAAACTGGC 


TTGGCCACAG CCAGAGCCTT CTGCTGGAGA 


CCCAGTGGAG 


1140 


AGAGAGGACC 


AGGCAGAAAA 


TTCAAAGGTC TCAAACCGGA ATTGTCTTGT 


TACCTGACTC 


1200 


TGGAGTAGGT 


GGGTGTGGAA 


GGGAAGATAA ATATCACAAG TATCGAAGTG 


ATCGCTTCTA 


1260 


TAAAGAGAAT 


TTCTATTAAC 


TCTCATTGTC CCTCACATGG ACACACACAC 


ACACACACAC 


1320 


ACACACACAC 


ACACATCACT 


AGAAGGGATG TCACTTTACA AGTGTGTATC 


TATGTTCAGA 


1380 


AACCTGTACC CGTATTTTTA TAATTTACAT AAATAAATAC ATATAAAATA TATGCATCTT 


1440 


TTTATTAGAT 


TCATTTATTT 


GAATATAAAT GTATGAATAT TTATAAAATG 


TAATAATGCA 


1500 


CTCAGATGTG 


TATCGGCTAT 


TTCTCGACAT TTTCTTCTCA CCATTCAAAA 


CAGAAGCGTT 


1560 


TGCTCACATT 


TTTGCCAAAA 


TGTCTAATAA CTTGTAAGTT CTGTTCTTCT 


TTTTAATGTG 


1620 


CTCTTACCTA 


AAAACTTCAA 


ACTCAAGTTG ATATTGGCCC AATGAGGGAA 


CTCAGAGGCC 


1680 


AGTGGACTCT 


GGATTTGCCC 


TAGTCTCCCG CAGCTGTGGG CGCGGATCCA 


GGTCCCGGGG 


1740 


GTCGGCTTCA 


CACTCATCCG 


GGACGCGACC CCTTAGCGGC CGCGCGCTCG 


CCCCGCCCCG 


1800 


CTCCACCGCG 


GCCCCGTACG 


CGCCGTCCAC ACCCCTGCGC GCCCGTCCCG 


CCCGCCCGGG 


1860 


GGATCCCGGC 


CGTGCTGCCT 


CCGAGGGGGA GGTGTTCGCC ACGGCCGGGA 


GGGAGCCGGC 


1920 


AGGCGGCGTC 


TCCTTTAAAA 


GCCGCGAGCG CGCGCCAGCG CGGCGTCGTC 


GCCGCCGGAG 


1980 


TCCTCGCCCT 


GCCGCGCAGA 


GCCCTGCTCG CACTGCGCCC GCCGCGTGCG 


CTTCCCACAG 


2040 


CCCGCCCGGG 


ATTGGCAGCC 


CCGGACGTAG CCTCCCCAGG CGACACCAGG 


CACCGGAGCC 


2100 


CCTCCCGGCG 


AAAGACGCGA GGGTCACCCG CGGCtTCGAG GGACTGGCAC GACACGGGTT 


2160 


GGAACTCCAG 


ACTGTGCGCG 


CCTGGCGCTG TGGCCTCGGC TGTCCGGGAG AAGCTAGAGT 


2220 


CGCGGACCGA 


CGCTAAGAAC 


CGGGAGTCCG GAGCACAGTC TTACCCTCAA 


TGCGGGGCCA 


2280 


CTCTGACCCA 


GGAGTGAGCG CCCAAGGCGA TCGGGCGGAA GAGTGAGTGG ACCCCAGGCT 




GCCACAAAAG 


ACACTTGGCC 


CGAGGGCTCG GAGCGCGAGG TCACCCGGTT 


TGGCAACCCG 


2400 


AGACGCGCGG 


CTGGACTGTC 


TCGAGAATGA GCCCCAGGAC GCCGGGGCGC 


CGCAGCCGTG 


2460 


CGGGCTCTGC 


TGGCGAGCGC TGATGGGGGT GCGCCAGAGT CAGGCTGAGG GAGTGCAGAG 


2520 


TGCGGCCCGC 


CCGCCACCCA 


AGATCTTCGC TGCGCCCTTG CCCGGACACG 


GCATCGCCCA 


2580 
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CGATGGCTGC CCCGAGCCAT GGGT06CGGC CCACGTAACG CAGAACGTCC GTCCTCCGCC 
CGGCGAGTCC CGGAGCCAGC CCCGCGCCCC GCCAGCGCTG GTCCCTGAGG CCGACGACAG 
CAGCAGCCTT GCCTCAGCCT TCCCTTCCGT CCCGGCCCCG CACTCCTCCC CCTGCTCGAG 
GCTGTGTGTC AGCACTTGGC TGGAGACTTC TTGAACTTGC CGGGAGAGTG ACTTGGGCTC 
CCCACTTCGC GCCGGTGTCC TCGCCCGGCG GATCCAGTCT TGCCGCCTCC AGCCCGATCA 
CCTCTCTTCC TCAGCCCGCT GGCCCACCCC AAGACACAGT TCCCTACAGG GAGAACACCC 
GGAGAAGGAG GAGGAGGCGA AGAAAAGCAA CAGAAGCCCA GTTGCTGCTC CAGGTCCCTC 
GGACAGAGCT TTTTCCATGT GGAGACTCTC TCAATGGACG TGCCCCCTAG TGCTTCTTAG 
ACGGACTGCG GTCTCCTAAA GGTAGAGGAC ACGGGCCGGG GACCCGGGGT TGGCTGGCGG 
GTGACACCGC TTCCCGCCCA ACGCAGGGCG CCTGGGAGGA CTGGTGGAGT GGAGTGGACG 
TAAACATACC CTCACCCGGT GCACGTGCAG CGGATCCCTA GAGGGGTTAG GCATTCCAAA 
CCCCAGATCC CTCTGCCTTG CCCACTGGCC TCCTTCCTCC AGCCGGTTCC TCCTCCCCAA 
GTTTTCGATA CATTATAAGG GCTGTTTTGG GCTTTCAAAA AAAAAAATGC AGAAATCCAT 
TTAAGAGTAT GGCCAGTAGA TTTTACTAGT TCATTGCTGA CCAGTAAGTA CTCCAAGCCT 
TAGAGATCCT TGGCTATCCT TAAGAAGTAG GTCCATTTAG GAAGATACTA AAAGTTGGGG 



TTCTCCATGT GTGTTTACTG ACTATGCGAA TGTGTCATAG 
CACTATCTAT TTAGTTAATT GCAGGAAGGT GCATGGATTT 
GGGGAAGGGG GAACAGGGTT GCCTGTGGGT CAACCTTAAA 
CTTGCAAGTG GCGTCATTAG CAGTAATCTT GAGTTTAGCG 
GATATGCTCA ACTACCAGGA AATTGTATAC AGCGCCTCTA 



CTTACACGTG CATTCATAAA 
CTTGACTGCA CAGGAGTCTT 
TAGTTAGGGC GAGGCCACAA 
CTTACTGAAT CTACAAGTTT 
AGGAAGTCAC TTQTGCATTT 



GTGTCTGTTA 
ACCAACCTAT 
GGGTACAGAC 
CCGGCTGGGA 
GGCGGCGGAG 
CCATGGTGGC 
GCGCGGCCGG 
CCTTGTCCCG 
TTGGCCTGAA 



ATATGCACAT GAGGCTGCAC TGTATAAGTT 
GGCTTCCCAG CTTCCTGACA CCCGCATTCC 
GGTCAAGCTC TTTTTAATTG GGAGTTAAOA 
CTTGGGGGTC CTCCATCGGC CAGCGAGCTC 
GACTGGGCGG GGAACGTGGG TGACTCACGT 
CGGGACCCGC TGTCTTCTAG TGTTGCTGCT 
CCTCATTCCA GAGCTGGGCC GCAAGAAGTT 
GCCTTCGGAA GACGTCCTCA GCGAATTTGA 
GCAOAGACCC ACCCCCAGCA AGGACGTCGT 



TGTCAGGGAT 
CAGCTAGTGT 
CCAAGCCCCA 
TATGGGAGCC 
CGGCCCTGTC 
TCCCCAGGTC 
CGCCGCGGCA 
GTTGAGGCTG 
GGTGCCCCCC 



GCAGTGTCCG 
CACAAGAAAA 
AGTAAGAAGT 
GAGGCGCGGG 
CGCAGGTCGA 
CTCCTGGGCG 
TCCAGCCGAC 
CTCAGCATGT 
TATATGCTAG 



2640 

2700 

2760 

2820 

2880 

2940 

3000 

3060 

3120 

3180 

3240 

3300 

3360 

3420 

3480 

3540 

3600 

3660 

3720 

3780 

3840 

3900 

3960 

4020 

4080 

4140 

4200 

4260 

4320 
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ATCTGTACCG CAGGCACTCA GGCCAGCCAG GAGCGCCCGC CCCAGACCAC CGGCTGGAGA 4380 

GGGCAGCCAG CCGCGCCAAC ACCGTGCGCA CGTTCCATCA CGAAGGTGAG CGGGCGGCGG 4440 

GTGGCGGGGC GGGGACGGCG GGCGGGCGGA GACTAGGCGG GCAGCCCGGG CCTCCACTAG 4500 

CACAGTAGAA GGCCTTTCGG CTTCTGTACG GTCCCCTCTG TGGCCCCAGC CAGGGATTCC 4560 

CCGCTTGTGA GTCCTCACCC TTTCCTGGCA AGTAGCCAAA AGACAGGCTC CTCCCCCTAG 4620 

AACTGGAGGG AAATCGAGTG ATGGGGAAGA GGGTGAGAGA CTGACTAGCC CCTAGTCAGC 4680 

ACAGCATGCG AGATTTCCAC AGAAGGTAGA GAGTTGGAGC TCCTTAAATC TGCTTGGAAG 4740 

CTCAGATCTG TGACTTGTGT TCACGCTGTA GTTTTAAGCT AGGCAGAGCA AGGGCAGAAT 4 BOO 

GTTCGGAGAT AGTATTAGCA AATCAAATCC AGGGCCTCAA AGCATTCAAA TTTACTGTTC 4860 

ATCTGGGCCT AGTTTGAAAG ATTTCTGAAT CCCTATCTAA TCCCCGTGGG AGATCAATTC 4920 

CAGAATTCGT CATATTGTTT CCACAATGAC CTTCGATTCT TTGCTTAAAT CTTAAATCTC 4980 

CAAGTGGAGA CAGCGCAACG CTTCAGATAA AAGCCTTTCT CCCACTGCCT GCTACCTTCC 5040 

TAGGCAAGGC AATGGGGTTT TTAAACAAAT ATATGAATAT GATTTCCCAA GATAGAATAA 5100 

TGTTGTTTAT TTCAGCTGAA ATTTCCTGGA TTAGAAAGGC TGTAGAGGCC TATTGAAGTC 5160 

TCTTGCACCG ATGTTCTGAA AGCAGTTAGT AAAAAATCAT GACCTAGCTC AATTCTGTGT 5220 

GTGCCACTTT CAATGTGCTT TTGACTTAAT GTATTCTCCA TAGAACATCA GTTCCTTCAA 5280 

GTTCTAGAAG AATTCAGATT TAAAGTTTTG CTTTGCCTTG CTGAGGGGAT AAATTTTAAG 5340 

TAGAAATCTA GGCTCTGAAA TGATAGCCCA ACCCCATCTC CAGTAAGGGA TGACTGACTC 5400 

AAACCTTGAG AAGTCTGGGT. GATAATAGGA AAAGTCCACA AGCAGGTCAC AGAGCGCGAG 5460 

ATGGATCTGT CTTGAGGCAG CCAATGGTTA TGAAGGGCAC TGGAAATCCA TCTCTTTCAA 5520 

ACTGGTGTCT AGGGCTTTCT GGGAGCAAAG CTTAGACCAC ATTCTGCTCC TCAAGGTTTG 5580 

CCTACTGAAA GGAGGGAGAT TCTGGGTGTT CACCCCCATC CTTCACCCCC AGGTGATTCT 5640 

GGGCTTAGCT AATCTCTCCT GGTTAATATT CATTGGAAAG TTTTTATAGA TCAAAACAAA 5700 

CAAACCTACT ATCCAGCACA GGTGTTTTTC CCACTGCCTC TGGAGATATA GCAAGAAAAC 5760 

CATATATTCA TGTATTTCCT TATTAGTCTT TTCTAACGTG AAAATTATTC CTGACCTATA 5820 

AAAAATGAAG GAGGTATTTT ATCTTAACTA AGCTAAAAGA ATCGCTTAAG TCAATTGAAA 5880 

CTCAAAAATC CAATTGAATG AAAGGTTCGT CAATAAAAAT CTACATTTTT CTTACTCTTC 5940 

CTTTGGAAAT AGCTTGATAA AAACACAGAC AAAACAAAGT CTGTGTGCTT ATTTGAAAAC 6000 

TTAGTGAGCT TCAGTTCATA AGCAAAAAAT GTAGTTTAAA AGTGATTTTT CTGTTGTAAA 6060 
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ACGTGATAGA AGTTATTGAC TTGTTTAAAA TAAACTTGCA CTAACTTTAT ACCTTGGTGC €120 

AATTAGATGT AATGTTTACT GTAAATTTCA GGAAAACCAT lUTimTr TGGTCATGAT 6180 

CAGGTACACA TGGCATTTGG GAAGACTTTT CACATTGTTG AGTAACCTAG AGTTTGTTTG €240 

TTTGTTTGTT TGTTTTTAAG CATTCTTGTG CCACTAGAAA AACCTTAATA AGCCATGTGT €300 

TACTTGGTAG ACTTCTTCCT AAGTTCTAGA AAGTGGCTTA ATGCCACGAT GAGACAAAAC 6360 

ATACCATAGT AGTCTTTCAA CCAGTGGCAG AGTCTTCCAG ACAAAATCTC CTGTTGAACA 6420 

TTAAGACCAT -GGATTTTTAT CCAGGAGAGC CCAGGCTTTG CTGAATCACC ACCCTCCAAC 6480 

CCCACTCCAA GGTCACCGAA GGCCTCCCCA ACTGGCTGCC ATTGAGAAAC TGTTTGAAAT 6540 

TGATTGACTC CATTGGCCCT ACAGAGACTT CTCCTTTAGT GGCAGATCAT ATACTGAAGG 6600 

ATCCAAGCTT GCTCTTCTGA CTATGAAGAG CACAGTCTTT llUUUm ' ATGGAATAAA 6660 

CAAACTATGT GGCCCTGTGA CTAAAGTTTT CAAAGAOGOA GAGATCCTGT TAGCAGAAGT 6720 

GCAACTGCCC AGAAACTAGC CACAGGCTAG GATATTCCAA AGTACAACTC TAAAGTATGG 6780 

TCCATCCTAA ATTCTAGCAT GGGGTTGAAT ACCGGCATCC AGGAATACTT CTCTCTACCT 6840 

CTGGCTATTG CAGTGAGATT ACGAAGACCC TGGGGGGAAA AACAGTTGCT TAGTTTACAG 6900 

ATGTTCCTTG CCACAGATGT TCTCAGTATC TCTTGTTTGT CAGAGGATCC TTTCAATCCC 6960 

TCTTGACATT TCCAATCTGC TTTTGTCCTC TCTACATGTG CCTTGTGGCA TTTCGCTTGG 7020 

TCTTTAGAGA ATCCCTTTCT GGAGCTGCAG GTTCCCTTGT AGGATCTGTG TTCAGGAGAA 7080 

CAGGGACCTT GGCAGGTTAG TGACAACTAC CAAACCCTGC TTTCCTTCCC TCCCACTTCC 7140 

TTTGTTGCCT TAAAAATTAA ACCTTAACTC TCTGTGTCTA AACCTTTTCT TCTTCCTCTT 7200 

TGTCATTTAC TTTATTTATT TGTCATGTAC TTTATCCTGT AGAAAATCAC AGTGTGGCCC 7260 

AAAGCCCCTT GAATCTTGTT GCAGCGGTGA GATGCAGCTG CTGATCTGGA AT AG CCTTAG . 7320 

GCTGTGTGTT TGATCACAAT GCTTTCTGTC CAAAAGTGTG CAAATCCTCC AAGGTTAATG 7380 

ATAACTTTTG AAATGAAACT CACCCTACTT TAGGGCAAAC AAGTAGCCAC AGAGAGCAGG 7440 

ATCTAAACAA GGTCTGGTGT CCCATTTGGC TX3TGTCCCTT CAATTTTCTG TTCATTTAGC 7500 

TCTGTCTGCA TCTAAAGGGT GCTGGGCAAT AAGTTTTGAT CTTCAGGGCA AAACTCAATC 7560 

TTCAGTTACC ATGGTATCAG GTACCAATTC CTAGTGATTT GTGCTATGGC TTAGGATTTG 7620 

ATTTCTCTCC TACATTAGGT AATATCTTTC AATGGCTAGA ACTTGGGCAT TGCAGTACAC 7680 

TCAAGTTAAC AGTTCTGTGA CCTAAGGAAG TCACATAACC TCTCTGAATT CTCTACTGTT 7740 

TCATTCACAA AATGGAGAAA ATCATGGCTC TTTCTTAATG TGCGAATTCA TAGAAAGGTG 7800 
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ATGACACCAG ATTTGGCAGA AGGAAGGAAA GGAAGGAAGG AAGAAAGAAA GAAAGAAAGA 7860 

AAGAAAGAAA GAAAGAAAGA AAGAAAGAAA GGAAGGAAGG GAGAGAGAGA GAAGGGAAGG 7920 

GAAAGGGAAA GGGAAAGGAA AGAAAAGAAA GGAAOGAAGA AAAGGAAGGA AGGAAGGAAA 7980 

GAAGGAAGGA AGGAAAAGAA AGAGAAGAAA GCATTCAGCA TATGAACTAA TGTTTCCTGG 8040 

TGACTTTTTA TATCATATCC TTGTTCTAGG AAGTGGCCCT AGCCATATCT TTTGGGTTAT 8100 

TTTGAGGTAG AGGATAATCA ACATAGTGTA GAACATTAAA. TCTGGGTTTT GTTTCTAGAA 8160 

GAGGCTAGAA TGGCATGGCT GTCCCACTTG CTCCTCTTTC AGGCAGTATG GCAGCCACCA 8220 

TTCTCTCTGT AAGATCTAGG AGGCTGACAC TCAGGTTGGA GACAGGTCAG AATCCTGAAA 8280 

TCACTTAGCA AGTTCAGCTG ATTCAACAAG GGATATTTAC AGAGAATTAA CAGCTATTCC 8340 

AGCTTCCAAA AAGTGTACAT TACCTACTCT GTATTTTCAG AACCCCAGGT TTGCTGTGAT 8400 

AATTTGGTAG AAGCCTTTTC CTGTAATTTT CTTTATTTAA AAGATATTTT CATTTTCCAC 8460 

CCTCAAGAAG AGGTTGAAAC TTGTCCCTTG AAGTAGAAGA GGTGTTOTGT GTCCTGACCC 8520 

TGAGGAAGTT GGCCTTGTTG AGGTCTTCTG TAAATTCTTG AATTCTCTGT ATAATTTCAA 8580 

TGAATAGTCA TGTTTGATAC CTTGGTATAA AGGATGGGAT AAGATCTTTC AAGGCTTAGG 8640 

CTGATGGAAA CGCTGCTGAA AGACTAGAGA TTGCTCTTTC CTTTGGCATC TGTCTTGGGT 8700 

AGTAATATTG TTCTCTGTGA AGGCCCACTT ATTCTGTCTT GAAAATTCTT CTTACCTGCA 8760 

GAGTGATAGG CCACAGGGAG TACTGTTTCT ATGTTTGCAG TTGAAAGATG ACAATTTCAT 8820 

ATGGTCCAAA CTTGGCTTTA TTTCTTGGTG AGATATTATT CTGTTACTTC AATGACCTGT 8880 

CTCCATTATT TATCTTGAGG CTCACCTCTT CCCTTTTGTT GACTGTTGTG CAATTTGTGG 8940 

AAGGCCCTGG GTAGTCAGCC TTTATACTCT GTCTGTACAG GAAATAAAGT GCATGTCACC 9000 

ATGCCAAAGT CAGGAGATGC CGGTGTGATT AGGGTCCACG GGATTTTGCT ACTGTTTTTA 9060 

TTTCTATCGA TGAATTGCCT TAGGCAGAAA CATTAAGGGA CACCAGAATG GTGATGAAAG 9120 

GCTTTTTATA ACAGAAGCTA AATGCAGTCC TTCATACTTC ATGGAATGCC CCTGTCCTAA 9180 

AGTACCATTA ACCGATAGTG GAGTCAGAAC ATAAATGGCT CCCCAAAGGT ATCACCAAGA 9240 

ACTTTTGGCA AACAGATGCA AGAGGATTAT GAAGAATCGC AGCTTGGTCT GGTAATCTTC 9300 

CTGTTGCAAA GAGAAGAGCT TTAGAAGACC CCCCTTGAGT CCCTGGCTGG CTTAACATAG 9360 

CATGAACCCT CATGTGTTGG CCAACATTAA GGCTTTTTCT ATAAAAGTCT CCTCCTTCAT 9420 
CAGTATACGC TCGAGTATGA AAAGCATCCT TTTAAACCTT GACTCTGTGT GGTCCAGAAA . 9480 

CAGCAGCATC CCTTGCTTAA GAGCTTAATG GAGATGCAGG AGTGCAGGCC TCTTCCCAGA 9540 
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CCGGCTGATG TGCAGGTCAA AGTCTAAGCA CTGCTGGATC AACACAGAAG TTATTCCGAA 9600 

TGAGGATGAG ATGGATACGA GAGAACAGGA AGTAGGAAGG GATTTCTTTA TCGTGAATTG 9660 

CTACAGCAQC CTAATOTCAC CCCATACCCT TCTGAAGAAC TATGTCCCTG TGGATGCCTT 9720 

TGTCTCTAGA GTTCTGAGCA AAATGGTAGG GTGTGCTTTG CAAAATGTCA TCATTGATGT ■ 9780 

TGAATTTCAA AGTCTTTAAT TAAGGGGCTG AAATCTGTAT ATTGAOATTT GTAAATCATC 9640 

TAAATTGTAG AGTAATGTTT GCACAGGCTG CTTAAGGGAT TGACATTAAA GCTCGTTTTC 9900 

TTAGTTAAGA AATACAGTCA TTTCCTCAAC TCCTGAGTCA TTAGCTCTCT ACTAAGTACA 996 0 

GTGCTGACTT TTTTAAAATT AAAGTCTGTG AATTCCAAAG AAGTGTTTCA CTATTTCCTC 10020 

CATTATTATA GCTACCTAGA AGCTATGTTC ATATATTGGA TTAAAAACGT AGCAATTACA 10080 

AAGTTAATGT GGCCATATAG AAAAGGGAAA AGAAACTCCG CTTTCACTTT AATATATATA 10140 

TGTGTGTGTG TATATCATAT ATATACATGT TGTGTGTGTA TATATATATA TATATATATA 10200 

TATATATATA TATATATATA TATATATATA TQTTGTGTTA AGCAGTAAAC TCAGGCCATG 10260 

GACAGAGGGG CAOACATTGT ATCTCTAGGC CTGACATTTT TAATTTCTGG TTGCAGGTTT 10320 

TTATGTAGTT TAACTTAAAC CATGCACTGA AGTTTTAAAT GCTCGTAAGG AATTAAGTTA 10380 

CCATTGGCTC TCTTACCAAA TGCGTTTCTT TTTTCTCTCC ACCCTGATCA AACTAGAAGC 10440 

CGTGGAGGAA CTTCCAGAGA TGAGTGGGAA AACGGCCCGG CGCTTCTTCT TCAATTTAAG 10500 

TTCTGTCCCC AGTGACGAGT TTCTCACATC TGCAGAACTC CAGATCTTCC GGGAACAGAT 10560 

ACAGGAAGCT TTGGGAAACA GTAGTTTCCA GCACCGAATT AATATTTATG AAATTATAAA 10620 

GCCTGCAGCA GCCAACTTGA AATTTCCTGT GACCAGACTA TTGGACACCA GGTTAGTGAA 10680 

TCAGAACACA AGTCAGTGGG AGAGCTTCGA CGTCACCCCA GCTGTGATGC GGTGGACCAC 10740 

ACAGGGACAC ACCAACCATG GGTTTGTGGT GGAAGTGGCC CATTTAGAGG AGAACCCAGG 10800 

TGTCTCCAAG AGACATGTGA GOATTAGCAG GTCTTTGCAC CAAGATCAAC ACAGCTGGTC 10860 

ACAGATAAGG CCATTGCTAG TGACTTTTGG ACATGATGGA AAAGGACATC CGCTCCACAA 10920 

ACGAGAAAAG CGTCAAGCCA AACACAAACA GCGOAAGCGC CTCAAGTCCA GCTGCAAGAG 1098 0 

ACACCCTTTG TATGTGGACT TCAGTGATGT GGGGTGGAAT GACTGGATCG TGGCACCTCC 11040 

GGGCTATCAT GCCTTTTACT GCCATGGGGA GTGTCCTTTT CCCCTTGCTG ACCACCTOAA 11100 

CTCCACTAAC CATGCCATAG TGCAGACTCT GGTGAACTCT GTGAATTCCA AAATCCtTAA 11160 

GGCATGCTGT GTCCCCACAG AGCTCAGCGC AATCTCCATG TTGTACCTAG ATGAAAATGA 11220 

AAAGGTTGTG CTAAAAAATT ATCAGGACAT GGTTGTGGAG GGCTGCGGGT GTCGTTAGCA 11280 
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CAGCAAGAAT AAATAAATAA ATATATATAT TTTAGAAACA GAAAAAACCC TACTCCCCCT 1134 0 

GCCTCCCCCC CAAAAAAACC AGCTOACACT TTAATATTTC CAATGAAGAC TTTATTTATG 11400 

GAATGGAATG AAAAAAACAC AGCTATTTTG AAAATATATT TATATCGTAC GAAAAGAAGT 11460 

TGGGAAAACA AATATTTTAA TCAGAGAATT ATTCCTTAAA GATTTAAAAT GTATTTAGTT 11520 

GTACATTTTA TATGGGTTCA ACTCCAGCAC ATGAAGTATA AGGTCAGAGT TATTTTGTAT 1158 0 

TTAXTTACTA TAATAACCAC TTTTTAGGGA AAAAAGATAG TTAATTGTAT TTATATGTAA 11640 

TCAGAAOAAA TATCGGGTTT GTATATAAAT TTTCCAAAAA AGGAAATTTG TAGTTTGTTT 11700 

TTCAGTTGTG TGTATTTAAG ATGCAAAGTC TACATGGAAG GTGCTGAGCA AAGTGCTTGC 11760 

ACCACTTGCT GTCTGTTTCT TGCAGCACTA CTGTTAAAGT TCACAAGTTC AAGTCCAAAA 11820 

AAAAAAAAAA AGGATAATCT ACTTTGCTGA CTTTCAAGAT TATATTCTTC AATTCTCAGG 11880 

AATGTTGCAG AGTGGTTGTC CAATCCGTGA GAACTTTCAT TCTTATTAGG GGGATATTTG 11940 

GATAAGAACC AGACATTACT GATCTGATAG AAAACGTCTC GCCACCCTCC CTGCAGCAAG 12000 

AACAAAGCAG GACCAGTGGG AATAATTACC AAAACTGTGA CTATGTCAGG AAAGTGAGTG 12060 

AATGGtrrCTT GTTCTTTCTT AAGCCTATAA TCCTTCCAGG GGGCTGATCT GGCCAAAGTA 12120 

CTAAATAAAA TATAATATTT CTTCTTTATT AACATTGTAG TCATATATGT GTACAATTGA 12180 

TTATCTTGTG GGCCCTCATA AAGAAGCAGA AATTGGCTTG TATTTTGTGT TTACCCTATC 12240 

AGCAATCTCT CTATTCTCCA AAGCACCCAA TTTTCTACAT TTGCCTGACA CGCAGCAAAA 12300 

TTGAGCATAT GTTTCCTGCC TGCACCCTGT CTCTGACCTG TCAGCTTGCT TTTCTTTCCA 12360 

GGATATGTGT TTGAACATAT TTCTCCAAAT GTTAAACCCA TTTCAGATAA TAAATATCAA 12420 

AATTCTGGCA TTTTCATCCC TATAAAAACC CTAAACCCCG TGAGAGCAAA TGGTTTGTTT 12480 

GTGTTTGCAG TGTCTACCTG TGTTTGCATT TTCATTTCTT GGGTGAATGA TGACAAGGTT 12540 

GGGGTGGGGA CATGACTTAA ATGGTTGGAG AATTCTAAGC AAACCCCAGT TGGACCAAAG 12600 

GACTTACCAA TGAGTTAGTA GTTTTCATAA GGGGGCGGGG GGAGTGAGAG AAAGCCAATG 12660 

CCTAAATCAA AGCAAAGTTT GCAGAACCCA AGGTAAAGTT CCAGAGATGA TATATCATAC 12720 

AACAGAGGCC ATAGTGTAAA AAAATTAAAG AATGTCTGAT CAGCGTCTCA GCACATCTAC 12780 

CAATTGGCCA GATGCTCAAA CAOAGTQAAG TCAGATGAGG TTCTGGAAAG TGAGTCCTCT 12840 

ATGATGGCAG AGCTTTGGTG CTCAGGTTGG AAGCAAAACC TAGGGAGGGA GGGCTTTGTG 12900 

GCTGTTTGCA OATTGGGGAA TCCAGTGCTA GTTCCTGGCA GGOTTTCAGG TCAGTTTCCG 12960 

GAGTGTGTGT CCTGTAGCCC TCCGTCATGG TTGAAGCCCA GGTCTCACCT CCTCTCCTGA 13020 
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CCCGTGCCTT AGAACTGACT TGGAAAGCGG TGTGCTTACA 
TAAATTCTTC CCAAGGACCT CC6TGCAATO ACCCCAAGCA 
AAGGTTCTGA AGATCTTGTT TTAAATGACT ACCCTGGTTA 
CCCTTTAOTT GTTGCACAGG TAGAAACGAT TAGACCCAAC 
TGGTCCTTCA GTCATTCTCT AATGTCTCTT GCTTGCCATG 
TCTTAACATC TTATAAAATG AATGAACCAC ATATTTACAT 
AGTGCGATCA TTCCATAAGG ATCCCACCTT CTGGCAGGTC 
CTTCATTGGT CTTGATTTTC TTGGCTAAAA TTACTTGTAG 
ATATAGGTAT ATACATACAT GTATGTGCAT ATAGTGTGTA 



GCAAGACAGA CTGTTATAAT 
CACTTACCTT CGGAAACCTT 
GCTTTTGATG TGTTCCTTAT 
TATGGGTAGC CTTGTCCTCC 



GGCACTGTAA CAAACTGCAA 
CTCCAAGTCC TCCAGATGGG 
TATCCAGTAC ATATTTTATG 
CACAGCAGGC CCCATGTGAC 
CATGTTCTAA TTTATACATA 



GCTATGTGAA GATTATGTTA CATATGTAGA TGGTCGCACT TCTGATTTCC ATTTAGGTTC 
AGAGAGAGAC GTCACAGTAA ATGGAGCTAT GTCATTGGTA TATCCCCGAG TGGTTCAGGT 
GTTCTCTCTA TTTTTTTAAG ATGGAGAACA CTCATCTGTA CTATCGAAAA CTGAGCCAAA 
TCACTTAGCA AATTTCTAGT CACTGCCTTG CTGTTAAGAT ACTGATTCAC TGGGTGCTGA 
CATGCTGAGC CCTGCCTACT TTTGCATGAA GGACAAGGAA GAGAGCTTGC AGTTAAGAAT 
GGTATATGTG GGGCTAGGGG GCGGCGTATA GACTGGCATA TATGTGAAGG AAGGTCACAA 
ACAGCCTGCA CTAATTTCCC TTTTCTGGTT TTATGTCTTG GCAGGGGAAA GGACAGGTAG 
GGTGGGGTTG AGGGGGAGGG CACACACATC TACTTGGATA AATTGCATCT CCTCTTTCCT 
TCACCCCGCC ACCATATCTT AAAGCCTTAT GACATCCTCT AGGGCAGAAT TTTCTCACCA 
GCTCCCCGCC CTACCAACTT CAAAGTGAAC TTCTAACTAA CTTOAGGGGG CAAAGTTCTA 
AATAAAACTT GTTAGAGTTT AGCGGGCACC TCAGTCATCA GGAATGCCTC CAGGAAAGCA 
AAAAGCTTGA TGTGTGTACA GCCACGTGGT GGAGTCCTGC CACCCTATGA TTCCTGTCCC 
AGTGGTCGTG TGGGGCCTGA GATCCTGAAT TTCTAATGAG CTCCCAGTAC GCCCTGACTC 
ACTGTGCCAG AGGACTGCAG TTTGAGTAGC AAGGTTGTGT GACTGTCTTC GATCATGGCT 
ACAGAAGCTG GCTCAAGTAC AGCCCTTCGT GTGTAAAAGC CATGTGTAAA TGAGAAGAAA 
CAGAAGGCAA AGCTGCGTTG CATGGCATCT GAATCAGTGC CCTGCAGTTT TGTTTTTTGT 
TTTTTTTTTT TCAAAGACAT TCTTTTTCCC AACAAGATGA GTGGCAATCT TATGTTCTAG 
CCACTCTTAG ACATGAAAAC ACTGGGTTGC TTATCTTGTA AAATCTGCTC TGCTTGCTTG 
CTTGGGCACG CTGCAGTCAG TTTAGTCAAA TGCGTGTCAG TACATCTATA TGTATCAGGG 
AGCAGGTGCA AGTCCTTAGA AATGTACTTT AAAAAACTTG AACACTTAAG TCAGTGTGCT 
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GAGCTGCTCC TGTGTGATGT TAGGCCAAGC ACCTGAGTTA AAGGGATCTC TTTGAAGGCA 14820 

GAGGGTAGAT GTCGTATGGT TGAAGCATTT GTTTATACTA AAATGATGCT TGACTTTTTT 14880 

TCTAAGTTAT AAGACAGTAC ACTGTATAAG TTCATTGAAC CTAGAGGG7G GCATAGGACT 14940 

CCAAATCTGG TATGGGAGGT TTGTTCTAAT GGAAGTTCGA ATCTTTTTTG CAGTTGGCTT 15000 

GGAATAAAGT GCTTATGTGA ATGGGCTTAA GCTAGGGAAA AAAATGGGTT TCCCTCTGCA 15060 

AAGAGGGTCA GCACAGAAAT AACTTCCTGG CTTTGCTTGC ATGAATGCCA CTTGTTAGCA 15120 

GATGCCCTGT GGGGATCCGA ATTC 15144 
(2) INFORMATION FOR SEQ ID NO: 7: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 9299 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

<ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 7: 

GAATTCGCTA GGTAGACCAG GCTGGCCCAG AACACCTAGA GATCATCTGG CTGCCTCTGT 60 

CTCTTGAGTT CTGGGGCTAA AGCATGCACC ACTCTACCTG GCTAGTTTGT ATCCATCTAA 120 

ATTGGGGAAG AAAGAAGTAC AGCTGTCCCC AGAGATAACA GCTGGGTTTT CCCATCAAAC 180 

ACCTAGAAAT CCATTTTAGA TTCTAAATAG GGTTTGTCAO GTAGCTTAAT TAGAACTTTC 240 

AGACTGGGTT TCACAGACTG GTTGGGCCAA AGGTCACTTT ATTGTCTGGG TTTCAGCAAA 300 

ATGAGACAAT AGCTGTTATT CAAACAACAT TTGGGTAAGG AAGAAAAATG AACAAACACC 360 

ACTCTCCCTC CCCCCGCTCC GTGCCTCCAA ATCCATTAAA GGCAAAGCTG CACCCCTAAG 420 

GACAACGAAT CGCTGCTGTT TGTGAGTTTA AATATTAAGG AACACATTGT GTTAATGATT 480 

GGAGCAGCAG TGATTGATGT AGTGGCATTG GTGAGCACTG AATCCGTCCT TCAACCTGCT 540 

ATGGGAGCAC AGAGCCTGAT GCCCCAGGAG TAATGTAATA GAGTAATCTA ATGTAATGGA 600 

GTTTTAATTT TGTGTTGTTG TTTTAAATAA TTAATTGTAA TTTTGGCTGT GTTAGAAGCT 660 

GTGGGTACGT TTCTCAGTCA TCTTTTCGGT CTGGTGTTAT TGCCATACCT TGATTAATCG 720 

GAGATTAAAA GAGAAGGTGT ACTTAGAAAC GATTTCAAAT GAAAGAAGGT ATGTTTCCAA 780 

TGTGACTTCA CTAAAGTGAC AGTGACGCAG GGAATCAATC GTCTTCTAAT AGAAAGGGCT 840 

CATGGAGACC TGAGCTGAAT CTTTCTGTTC TGGATGAGAG AGGTGGTACC CATTGGAATG 900 
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AAAGGACTTA GTCAGGGGCA ATACAGTGTG CTCCAAGGCT GGGOATGGTC AGGATGTTGT 960 

GCTCAGCCTC TAACACTCCT TCCAACCTGA CATTCCTTCT CACCCTTTGT CTCTGGCCAG 1020 

TAGAATACAG GAACTCGTTC CTGTTTTTTT TTTTTTAAAT TCTGAAGGTG TGTAAGTACA 1080 

AAGGTCAGAT GAGCGGCCCT AGGTCAAGAC TGCTTTGTGG TGACAAGGGA GTATAACACC 1140 

CACCCCAGAA ACCAAGAACC GGAAATTGCT ATCTTCCAGC CCTTTGAGAG CTACCTGAAG 1200 

CTCTGGGCTG CTGGCCTCAC CCCTTCCCTG CAGCTTTCCC TTTAGCAGAG GCTGTGATTT 1260 

CCTTCAGCGC TTGGGCAAAT ACTCTTAGCC TGGCTCACCT TCCCCATCCT CGTTTGTAAA 1320 

AACAAAGATG AAGCTGATAG TTCCTTCCCA GCTCCATCAG AGGCAGGGTG TGAAATTAGC 1380 

TCCTGTTTGG GAAGGTTTAA AAGCCGGCCA CATTCCACCT CCCAGCTAGC ATGATTACCA 1440 

ACTCTTGTTT CTTACTGTTG TTATGAAAGA CTCAATTCCT CATCTCCCTT TCCCTTCTTT IS 00 

TAAAAAGGGG CCAAAGGGCA CTTTGTTTTT TTCTCTACAT GGCCTAAAAG GCACTGTGTT 1560 

ACCTTCCTGG AAGGTCCCAA ACAAACAAAC AAACAAACAA AATAACCATC TGGCAGTTAA 1620 

GAAGGCTTCA GAGATATAAA TAGGATTTTC TAATTGTCTT ACAAGGCCTA GGCTGTTTGC 1680 

CTGCCAAGTG CCTGCAAACT ACCTCTGTGC ACTTGAAATG TTAGACCTGG GGGATCGATG 1740 

GAGGGCACCC AGTTTAAGGG GGGTTGGTGC AATTCTCAAA TGTCCACAAG AAACATCTCA 1800 

CAAAAACTTT TTTGGGGGGA AAGTCACCTC CTAATAGTTG AAGAGGTATC TCCTTCGGGC I860 

ACACAGCCCT GCTCACAGCC TGTTTCAACG TTTGGGAATC CTTTAACAGT TTACGGAAGG 1920 

CCACCCTTTA AACCAATCCA ACAGCTCCCT TCTCCATAAC CTGATTTTAG AGGTGTTTCA 1980 

TTATCTCTAA TTACTCGGGG TAAATGGTGA TTACTCAGTG TTTTAATCAT CAGTTTGGGC 2 040 

AGCAGTTATT CTAAACTCAG GGAAGCCCAG ACTCCCATGG GTATTTTTGG AAGGTACAGA 2100 

GACTAGTTGG TGCATGCTTT CTAGTACCTC TTGCATGTGG TCCCCAGGTG AGCCCCGGCT 2160 

GCTTCCCGAG CTGGAGGCAT CGGTCCCAGC CAAGGTGGCA ACTGAGGGCT GGGGAGCTGT 2220 

GCAATCTTCC GOACCCGGCC TTGCCAGGCG AGGCGAGGCC CCGTGGCTGG ATGGGAGGAT 2280 

GTGGGCGGGG CTCCCCATCC CAGAAGGGGA GGCGATTAAG GGAGGAGGGA AGAAGGGAGG 2340 

GGCCGCTGGG GGGAAAOACT GGOGAGGAAG GGAAGAAAGA GAGGGAGGGA AAAGAGAAGG 2400 

AAGGAGTAGA TGTGAGAGGG TGGTGCTGAG GGTGGGAAGG CAAGAGCGCG AGGCCTGGCC 2460 

CGGAAGCTAG GTGAGTTCGG CATCCGAGCT GAGAGACCGC AGCCTAAGAC GCCTGCGCTG 2520 

CAACCCAGCC TGAGTATCTG GTCTCCGTCC CTGATGGGAT TCTCGTCTAA ACCGTCTTGG 2580 

AGCCTGCAGC GATCCAGTCT CTGGCCCTCG ACCAGGTTCA TTGCAGCTTT CTAGAGGTCC 2640 
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CCAGAAGCAO CTGCTGGCGA OCCCGCTTCT GCAGGAACCA ATGGTGAGGA GGGCAACCTG 2700 

GAGAGGGGCG CTATTCTGAG GATTCGAGGT GCACCCGTAG TAGAAGCTGG GGATGGGGCT 2760 

CAGGCTGTAA CCOAGGCAAA AGTTGGCCTA TTCCTCCTTC CTTCTCCAAC AGTGTTGGAG 2820 

GTGGGATGAT GGAGGCTAAA AGGCACCTCC ATATATGTTA CTGCGTCTAT CAACCTACTT 2880 

TAGGGAGGTG CGGGCCAGGA GAGGCGGGAA GGAGAGAAGG CCTTGGAAGA GAGGTCATTG 2940 

GGAAGAACTG TGGGGTTTGG TGGGTTTGCT TCCACTTAGA CTATAAGAGT GGGAGAGGAG 3000 

GGAGTCAACT CTAAGTTTCA ACACCAGTGG GGGACTGAGG ACTGCTTCAT TAGGAGAGAG 3060 

AACCTAGCCA GAGCTAGCTT TGCAAAAGAG GCTGTAGTCC TGCTTTGCTC TAAAGCGCGA 3120 

CCCGGGATAG AGAGGCTTCC TTGAGCGGGG TGTCACCTAA TCTTGTCCCC AACGCACCCC 3180 

CTCCCAGCCC CTGAGAGCTA GCGAACTGTA GGTACACAAC TCGCTCCCAT CTCCAGGAGC 3240 

TATTTTCTTA GACATGGGCA CCCATGATTC TGCCTTCTGG TACTCTCCCC TCCCTGGGAA 3300 

AGGGGTGTAA GGTTCCGACG GAACCGTGGC CAGGATGCCG AAAGGCTACC TGTGCGGGTC 3360 

TTCTGCCATG CTGTGTCtGT GCGGACATGC CAGCAGGGCT AATGAGGAGC TTGCGATACT 3420 

CCAAAGGGTT CGGGAATTGC GGGGTCCTTA CACGCAGTGG AGTTGGGCCC CTTTTACTCA 3480 

GAAGGTTTCC GCCACGGCTT TGGTTGATAG TTTTTTTAGT ATCCTGGTTT ATGAACTGAA 3540" 

GGTTTTGTGA GATGTTGAAT CACTAGCAGG GTCATATTTG GCAAACCGAG GCTACTATTA 3600 

AATTTTGGTT TTAGAAGAAG ATTCTGGOGA GAAAGTGAAG GGTAACTGCC TCCAGGAGCT 3660 

GTATCAACCC CATTAAGAAA AAAAAAAATA CCAGGAGATG AAAATTTACT TTGATCTGTA 3720 

TTTTTTAATT AAAAAAAATC AGGGAAGAAA GGAGTGATTA GAAAGGGATC CTGAGCGTCG 3780 

GCGGTTCCAC GGTGCCCTCG CTCCGCGTGC GCGAGTCGCT AGCATATCGC CATCTCTTTC 3840 

CCCCTTAAAA GCAAATAAAC AAATCAACAA TAAGCCCTTT GCCCTTTCCA GCGCTTTCCC 3900 

AGTTATTCCC AGCGGCGACG CGTGTCGGGG AATAGAGAAA TCGTCTCAGA AAGCTGCGCT 3960 

GATGGTGGTG AGAGCGGACT GTCGCTCAGG GGCGCCCGCG GTCTCTGCAC CCAGGGCAGC 4020 

AGTGTGGGAT GGCGCTGGGC AGCCACCGCC GCCAGGAAGG ACGTGACTCT CCATCCTTTA 4080 

CACTTCTTTC TCAAAGGTTT CCCGAAAOTG CCCCCCGCCT CGAAAACTGG GGCCGGTGCG 4140 

GGGGGGGGGA GAGGTTAGGT TGAAAACCAG CTGGACACGT CGAGTTCCTA AGTGAGGCAA 4200 

AGAGGCGGGG TGGAGCGGGC TCTGGAGCGG GGGAGTCCTG GGACTCGGTC CTCGGATGGA 4260 

CCCCGTGCAA AGACCTGTTG GAACAAGAGT TGCGCTTCCG AGGTTAGAAC AGGCCAGGCA 4320 

TCTTAGGATA GTCAGGTCAC CCCCCCCCCC AACCCCACCC GAGTTGTGTT GGTGAATTTC 4380 
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TTGGAGGAAT CTTAGCCGCG ATTCTGTAGC TGGTGCAAAA GGAGGAAAGG GGTGGGGGAA 4440 

GGAAGTGGCT GTGCGGGGGT GGCGGTGGGG GTGGAGGTGG TTTAAAAAGT AAGCCAAGCC 4500 

AGAGGGAGAG GTCGAGTGCA GGCCGAAAGC TGTTCTCGGG TTTGTAGACG CTTGGGATCG 4560 

CGCTTGGGGT CTCCTTTCGT GCCGGGTAGG AGTTGTAAAG CCTTTGCAAC TCTGAGATCG 4620 

TAAAAAAAAT GTGATGCGCT CTTTCTTTGG CGACGCCTGT TTTGGAATCT GTCCGGAGTT 4680 

AGAAGCTCAG ACGTCCACCC CCCACCCCCC GCCCACCCCC TCTGCCTTGA ATGGCACCGC 4740 

CGACCGGTTT CTGAAGGATC TGCTTGGCTG GAGCGGACGC TGAGGTTGGC AGACACGGTG 4800 

TGGGGACTCT GGCGGGGCTA CTAGACAGTA CTTCAGAAGC CGCTCCTTCT AACTTTCCCA 4860 

CACCGCTCAA ACCCCGACAC CCCCGCGGCG GACTGAGTTG GCGACGGGGT CAGAGTCTTC 4920 

TGGCTGAAAG TTAGATCCGC TAGGGGTCGG CTGCCTGTCG CTAGAAGCAT TATTTGGCCT 4980 

CTCGGAGACC CGTGTGGAGG AAGTGCTGGA GTGTGCGAGT GTGTTTGCGT GTGTGTGTGT 5040 

GTGTGTGTGT GTGTGTGTGT GTGTGTGTGT GTGCGCGCGC CCTTGGAGGG TCCCTATGCG 5100 

CTTTCCTTTT CATGGAACGC TGTCGTGAGG CTTTGGTAAA CTGTCTTTTC GGTTCCTCTC 5160 

TCGGCTGCAC TTAAGCTTTG TCGGCGCTGT AAAGAGACGC GTCTTCAAGT GCACCCTGAT ,5220 

CCTCAGGGTT CAGATAACCC GTCCCCX3AAC CTGGCCAGAT GCATTGCACT GCGCGCCGCA 5280 

GGTAGAGACG TGCCCCACGT CCCCTGCGTG CAGCGACTAC GACCGAGAGC CGCGCCAGTG 5340 

TGGTGTCCCG CCGAGAGTTC CTCAGAGCAG GCGGGGACAA CTCCCAGACG GCTGGGGCTC 5400 

CAGCTGCGGG CGCGGAGGTT GGCCTCGCTC GCAGGGGCTG GACCCAGCCG GGGTGGGAGG S460 

ATGGAGGAGG GGCGGGCGGG CTCTTCGGTG AGTGGGGCGG GGCCTCTGGG TCCACGTGAC 5520 

TCCTAGGGGC TGGAAGAAAA ACAGAGCCTG TCTGCTCCAG AGTCTCATTA TATCAAATAT 5580 

CATTTTAGGA GCCATTCCGT AGTGCCATTC GGAGCGACGC ACTGCCGCAG CTTCTCTGAG 564 0 

CCTTTCCAGC AAGTTTGTTC AAGATTGGCT CCCAAGAATC ATGGACTGTT ATTATGCCTT 5700 

GTTTTCTGTC AGTGAGTAGA CACCTCTTCT TTCCCTTCTT GGGATTTCAC TCTGTCCTCC 5760 

CATCCCTGAC CACTGTCTGT CCCTCCCGTC GGACTTCCAT TTCAGTGCCC CX3CGCCCTAC 5820 

TCTCAGGCAG CGCTATGGTT CTCTTTCTGG TCCCTGCAAG GCCAGACACT CGAAATGTAC 5880 

GGGCTCCTTT TAAAGCGCTC CCACTGTTTT CTCTGATCCG CTGCGTTGCA AGAAAGAGGG 5940 

AGCGCGAGGG ACCAAATAGA TGAAAGGTCC TCAGGTTGGG GCTGTCGCTT GAAGGGCTAA 6000 

CCACtCCCTT ACCAGTGCCG ATATATCCAC TAGCCTGGGA AGGCCAGTTC CTTGCCTCAT 6060 

AAAAAAAAAA AAAAAAACAA AAAACAAACA GTCGTTTGGG AACAAGACTC TTTAGTGAGC 6120 
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ATTTTCAACG CAGCGACCAC AATGAAATAA ATCACAAAGT CACTGGGGCA GCCCCTTGAC 6180 

TCCTTTTCCC AGTCACTGGA CCTTGCTGCC CGGTCCAAGC CCTGCCGGCA CAGCTCTGTT €240 

CTCCCCTCCT CCTGTTCTTA ACCAGCTGGA AGTTGTGGAA ATTGGGCTGG AGGGCGGAGG 6300 

AAGGGCGGGG GTGGGGGGGT GGAGAAGGTG GGGGGGGGGG AGGCTGAAGG TCCGAAGTGA 6360 

AGAGCGATOG CATTTTAATT CTCCCTCCGC CTCCCCCCTT TACCTCCTCA ATGTTAACTG 6420 

TTTATCCTTG AAGAAGCCAC GCTGAGATCA TGGCTCAGAT AGCCGTTGGG ACAGGATGGA 6480 

GGCTATCTTA TTTGGGGTTA TTTGAGTGTA AACAAGTTAG ACCAAGTAAT TACAGGGCGA 6540 

TTCTTACTTT CGGGCCGTGC ATGGCTGCAG CTGGTGTGTG TGTGTGTAGG GTGTGAGGGA 6600 

GAAAACACAA ACTTGATCTT TCGGACCTGT TTTACATCTT GACCGTCGGT TGCTACCCCT 6660 

ATATGCATAT GCAGAGACAT CTCTATTTCT CGCTATTGAT CGGTGTTTAT TTATTCTTTA 6720 

ACCTTCCACC CCAACCCCCT CCCCAOAGAC ACCATGATTC CTGGTAACCG AATGCTGATG 6780 

GTCGTTTTAT TATGCCAAGT CCTGCTAGGA GGCGCGAGCC ATGCTAGTTT GATACCTGAG 6840 

ACCGGGAAGA AAAAAGTCGC CGAGATTCAG GGCCACGCGG GAGGACGCCG CTCAGGGCAG 6900 

AGCCATGAGC TCCTGCGGGA CTTCGAGGCG ACACTTCTAC AGATGTTTGG GCTGCGCCGC 6960 

CGTCCGCAGC CTAGCAAGAG CGCCGTCATT CCGGATTACA TGAGGGATCT TTACCGGCTC 7025 

CAGTCTGGGG AGGAGGAGGA GGAAGAGCAG AGCCAGGGAA CCGGGCTTGA GTACCCGGAG 7080 

CGTCCCGCCA GCCGAGCCAA CACTGTGAGG AGTTTCCATC ACGAAGGTCA GTTTCTGCTC 7140 

TTAGTCCTGG CGGTGTAGGG TGGGGTAGAG CACCGGGGCA GAGGGTGGGG GGTGGGCAGC 7200 

TGGCAGGGCA AGCTGAAGGG GTTGTGGAAG CCCCCGGGGA AGAAGAGTTC ATGTTACATC 7260 

AAAGCTCCGA GTCCTGGAGA CTGTGGAACA GGGCCTCTTA CCTTCAACTT TCCAGAGCTG 7320 

CCTCTGAGGG TACTTTCTGG AGACCAAGTA GTGGTGGTGA TGGGGGAGGG GGTTACTTTG 7380 

GGAGAAGCGG ACTGACACCA CTCAGACTTC TGCTACCTCC CAGTGGGTGT TCTTTAGCTA 7440 

TACCAAAGTC AGGGATTCTG CCCGTTTTGT TCCAAAGCAC CTACTGAATT TAATATTACA 7500 

TCTGTGTGTT TGTCAGGTTT ATCAATAGGG GCCTTGTAAT ACGATCTGAA TGTTTCCTAG 7560 

CGGATGTTTC TTTTCCAAAG TAAATCTGAG TTATTAATCC TCCAGCATCA TTACTGTGTT 7620 

GGAATTTATT TTCCCTTCTG TAACATGATC AACAAGGCGT GCTCTGTGTT TCTAGGATCG 7680 

CTGGGGAAAT GTTTGGTAAC ATACTCAAAA GTGGAGAGGG AGAGAGGGTG GCCCCTCTTT 7740 

TTCTTTACAA CCACTTGTAA AGAAAACTGT ACACAAAGCC AAGAGGGGGC TTTAAAAGGG 7800 

GAGTCCAAGG GTGGTGGAGT AAAAGAGTTG ACACATGGAA ATTATTAGGC ATATAAAGGA 7860 

SUBSTITUTE SHEET (RULE 26) 



WO9O38590 PCT/US96/08197 

-47- 



GGTTGGQAGA TACTTTCTGT CTTTGGTGTT TOACAAATGT GAGCTAAGTT TTGCTGQTTT 7320 

GCTAGCTGCT CCACAACTCT GCTCCTTCAA ATTAAAAGGC ACAGTAATTT CCTCCCCTTA 7980 

GOTTTCTACT ATATAAG CAG AATTCAACCA ATTCTGCTAT TTTTTGTTTT TGTTTCTTGT 8040 

TTTTGTTTTG TTTGGTTTTT TTTTTTTTTT TT TriTTTTT GTCTCAGAAA AGCTCATGGG 8100 

CCTTTTCTTT TCCCCTTTCA ACTGTGCCTA GAACATCTGG AGAACATCCC AGGGACCAGT 8160 

GAGAGCTCTG CTTTTCGTTT CCTCTTCAAC CTCAGCAGCA TCCCAGAAAA TGAGGTGATC 8220 

TCCTCGGCAG AGCTCCGGCT CTTTCGGGAG CAGGTGGACC AGGGCCCTGA CTGGGAACAG 82B0 

GGCTTCCACC GTATAAACAT TTATQAGGTT ATGAAGCCCC CAGCAGAAAT GGTTCCTGGA 8340 

CACCTCATCA CACGACTACT GGACACCAGA CTAGTCCATC ACAATGTGAC ACGGTGGGAA 84O0 

ACTTTCGATG TGAGCCCTGC AGTCCTTCGC TGGACCCGGG AAAAGCAACC CAATTATGGG 8460 

CTGGCCATTG AGGTGACTCA CCTCCACCAG ACACGGACCC ACCAGGGCCA GCATGTCAGA 8520 

ATCAGCCGAT CGTTACCTCA AGGGAGTGGA GATTGGGCCC AACTCCGCCC CCTCCTGGTC 8580 

ACTTTTGGCC ATGATGGCCG GGGCCATACC TTGACCCGCA GGAGGGCCAA ACGTAGTCCC 8640 

AAGCATCAC C CACAGCGGTC CAGGAAGAAG AATAAGAACT GCCGTGGCCA TTCACTATAC 8700 

GTGGACTTCA GTGACGTGGG CTGGAATGAT TGGATTGTGG CCCCACCCGG CTACCAGGCC 8760 

TTCTACTGCC ATGGGGACTG TCCCTTTCCA CTGGCTGATC ACCTCAACTC AACCAACCAT 8820 

GCCATTGTGC AGACCCTAGT CAACTCTGTT AATTCTAGTA TCCCTAAGGC CTGTTGTGTC 8880 

CCCACTGAAC TGAGTGCCAT TTCCATGTTG TACCTGGATG AGTATGACAA GGTGGTGTTG 894 0 

AAAAATTATC AGGAGATGOT GGTAGAGGGG TGTGGATGCC GCTGAGATCA GACAGTCCGG 9000 

AGGGCGGACA CACACACACA CACACACACA CACACACACA CACACACACA CACGTTCCCA 9060 

TTCAACCACC TACACATACC ACACAAACTG CTTCCCTATA GCTGGACTTT TATCTTAAAA 9120 

AAAAAAAAAA GAAAGAAAGA AAGAAAGAAA GAAAAAAAAT GAAAGACAGA AAAGAAAAAA 9180 

AAAACCCTAA ACAACTCACC TTGACCTTAT TTATGACTTT ACGTGCAAAT GTTTTGACCA 9240 

TATTGATCAT ATTTTGACAA ATATATTTAT AACTACATAT TAAAAGAAAA TAAAATGAG 9299 

(2) INFORMATION FOR SEQ ID NO:8: 

v 

(i) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 19 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

( ii) MOLECULE TYPE : CDNA 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 6: 
CGGATGCCGA ACTCACCTA 
(2) INFORMATION FOR SEQ ID NO: 9: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 18 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 9: 
CTACAAACCC GAGAACAG 
(2) INFORMATION FOR SEQ ID NO:10: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 18 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:10: 
CCCGGCACGA AAGGAGAC 
(2) INFORMATION FOR SEQ ID NO: 11: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 18 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 11: 
GAAGGCAAGA GCGCGAGG 



SUBSTITUTE SHEET (RULE 26) 



WO 96/38590 



-49- 



PCT/US96/08197 



(2) INFORMATION FOR SEQ ID NO: 12: 

(i) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 17 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 



{xi) SEQUENCE DESCRIPTION: SEQ ID NO: 12: 
CCCGGTCTCA GGTATCA 
(2) INFORMATION FOR SEQ ID NO: 13: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 17 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 13: 
CAGGCCGAAA GCTGTTC 
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Claims 

1. A system for identifying osteogenic agents comprising a recombinant host cell 
modified to contain an expression sequence comprising a promoter derived from a gene 
encoding a bone morphogenic protein operatively linked to a reporter gene encoding an 
assayable product 

2. The system of claim 1 wherein said bone morphogenic protein is selected from 
the group consisting of the BMP-2 and BMP-4 proteins. 

3. The system of claim 1 or 2 wherein said reporter gene comprises a gene 
encoding the production of an assayable product selected from the group consisting of 
firefly luciferase, chloramphenicol acetyl transferase, fJ-galactosidase, green fluorescent 
protein, human growth hormone, alkaline phosphatase and ^-glucuronidase, 

4. The system of claim 3 wherein said reporter gene comprises a gene encoding 
the production of firefly luciferase. 

5. A method for identifying an osteogenic compound comprising the steps of: 

culturing the cells of any of claim 1-4 under conditions which permit expression of 
said assayable product from said reporter gene; 

contacting said cells with at least one candidate compound suspected of possessing 
osteogenic activity; 

measuring the amount of assayable product produced in the presence of said 
candidate compound and comparing said amount to the amount of assayable product 
produced in the absence of said candidate compound; and 

identifying, as an osteogenic compound, a candidate compound that enhances the 
amount of said assayable product when present 
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6. An isolated nucleic acid molecule comprising a nucleotide sequence encoding 
the promoter region of a gene encoding bone morphogenetic protein selected from the 
group consisting of the BMP-2 and BMP-4 proteins. 

7. The nucleic acid molecule of claim 6 which corresponds to a nucleotide 
sequence selected from the group consisting of positions -2372 to +316 of the BMP-4 gene 
depicted in Figure 1C (SEQ ID NO:3), a portion thereof which encodes a biologically 
active promoter, the BMP-2 sequence depicted in Figure 1 1, and a portion thereof which 
encodes a biologically active promoter. 

8. A recombinant expression vector comprising the nucleotide sequence of claim 

6 or 7. 

9. The recombinant expression vector of claim 8 wherein said nucleotide 
sequence is operatively linked to a reporter gene encoding an assayable product 

10. The recombinant expression vector of claim 9 wherein said reporter gene 
comprises a gene encoding the production of an assayable product selected from the group 
consisting of firefly luciferase, chloramphenicol acetyl transferase, p-galactosidase, green 
fluorescent protein, human growth hormone, alkaline phosphatase or p-glucuronidase. 
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1 GAATTCATTT AAGCTGGATT CACTTCTAGG TCCCATGCGT TTACACTC^T 

51 TTCCACCACA AGAGGGCAGC CATCTCTAAA AAAACAACAG TCGAGTGC^C 

101 TTCAGAGAAA TTGGGCCAAA CTTGAGGAAA GTTCCTGGGA AAGGCTT^t 

151 AGCAGCACCT CTCTGGGCTA CAAAAAAGAA GCCAGCAGGC ACCACCAAGG 

201 TGGAGTAACT GTCCAGAGGC ATCCATTTTA CCTCAGAGAC TTGATTACA 

251 AGGATATCCT AAACGGCCAA ACTCTCTCTT CTGGTGTTCC AGAGGCCCAA 

2 CI AGCTGCAAGG CATTGTTGAT GTCATCACCA AAGGTTTCAT TTTCATCTTT 

3 51 TCTTGGGGTT GGTCCAACAG CTGTGAGCTT TCTCTTCCTC ATTAAAGGCA 

4 01 ACTTTCTCAT TTAAATCTCA TATAGGTTCG GAGTTTCTTG CTTTGCTCCT 
4 51 TCCGCCTCCG CGATGACAGA AGCAATGGTT AACTTCTCAA TTAAACTTGA 
501 TAGGGAAGGA AATGG CTTC A GAGGCGATCA GCCCTTTTGA CTTACACACT 
551 .TACACGTCTG AGTGGAGTGT TTTATTGCCG CCTTGTTTGG TGTCTCATGA 
601 TTCAGAGTGA CAACTTCTGC AACACGTTTT AAAAAGGAAT ACAGT AG CTG 
651 ATCGCAAATT GCTGGATCTA TCCCTTCCTC TCCTTTAATT TCCCTTGTAG 
701 ACAGCCTTCC TTCAAAAATA CCTTATTTGA CCTCTACAGC TCTAGAAACA 
751 GCCAGGGCCT AATTTCCCTC TGTGGGTTGC TAATCCGATT TAGGTGAACG 
301 AACCTAGAGT TATTTTAGCT AAAAGACTGA AAAGCTAGCA C ACGTGGGT A 
351 AAAAAATCAT TAAAGCCCCT GCTTCTGGTC TTTCTCGGTC TTTGCTTTGC 
901 AAACTGGAAA GATCTGGTTC ACAACGTAAC GTTATCACTC TGGTCTTCTA 
9 51 CAGGAATGCT CAGCCCATAG TTTTGGGGGT CCTGTGGGTA GCCAGTGGTG 

1001 GTACTATAAG GCTCCTGAAT GTAGGGAGAA ATGGAAAGAT TCAAAAAAGA 

1051 ATCCTGGCTC AGCAGCTTGG GGACATTTCC AGCTGAGGAA GAAAACTGGC 

1101 TTGGCCACAG CCAGAGCCTT CTG CTG GAG A CCCAGTGGAG AGAGAGGACC 

1151 AGGCAGAAAA TTCAAAGGTC TCAAACCGGA ATTGTCTTGT TACCTGACTC 

1201 TGGAGTAGGT GGGTGTGGAA GGGAAGATAA ATATCACAAG TATCGAAGTG 

1251 ATCGCTTCTA TAAAGAGAAT TTCTATTAAC TCTCATTGTC CCTCACATGG 

13 01 ACACACACAC ACACACACAC ACACACACAC AC AC AT C ACT AGAAGGGATG 

13 51 T C ACTTT AC A AGTGTGTATC TATGTTCAGA AACCTGTACC CGTATTTTTA 

14 01 TAATTTACAT AAATAAATAC ATATAAAATA T ATG CATCTT TTT ATT AG AT 
14 51 TCATTTATTT G AAT AT AAAT GTATGAATAT TTATAAAATG TAATAATGCA 
1501 CTCAGATGTG TATCGGCTAT TTCTCGACAT TTTCTTCTCA CCAiTCAAAA ' 
1551 CAGAAGCGTT TGCTCACATT TTTGCCAAAA TGTCTAATAA CTTGTAAGTT 
16 0L CTGTTCTTCT TTTTAATGTG CTCTTACCTA AAAACTTCAA ACTCAAGTTG:. 
1651 AT ATTGG CCC AATGAGGGAA CTCAGAGGCC AGTGGACTCT -GGATTTGCCC 
1701 TAGTCTCCCG CAGCTGTGGG CGCGGATCCA GGTCCCGGGG GTCGGCTTCA 
1751 CACTCATCCG GGACGCGACC CCTTAGCGGC CGCGCGCTCG CCCCGCCCCG 
1301 CTCCACCGCG. GCCCCGTACG CGCCGTCCAC ACCCCTGCGC. GCCCGTCCCG* 
13 51 CCCGCCCGGG GGATCCCGGC CGTGCTGCCT" CCGAGGGGGA GGTGTTCGCC 
1901 ACGGCCGGGA GGGAGCCGGC AGGCGGCGTC TCCTTTAAAA GCCGCGAGCG 
19 51 CGCGCCAGCG" CGGCGTCGTC GCCGCCGGAG TCCTCGCCCT GCCGCGCAGA 
2001 GCCCTGCTCG CACTGCGCCC GCCGCGTGCG CTTCCCACAG CCCGCCCGGG* 
2051 ATTGGCAGCC CCGGACGTAG CCTCCCCAGG: CGACACCAGG CACCGGAGCC 
2101 CCTCCCGGCG AAAGACGCGA GGGTCACCCG CGGCTTCGAG. GGACTGGCAC 
2151 GACACGGGTT GG AACTCCAG ACTGTGCGCG CCTGGCGCTG TGGCCTCGGC* 
2201 TGTCCGGGAC AAGCTAGAGT" CGCGGACCGA CG CTAAGAAC CGGGAGTCCG. 
2251 GAGCACAGTC TTACCCTCAA TGCGGGG CCA CTCTGACCCA GGAGTGAGCG 
23 01 CCCAAGGCGA. TCGGGCGGAA GAGTGAGTGG ACCCCAGGCT GCCACAAAAG" 
2351 ACACTTGGCC: CGAGGGCTCC GAGCGCGAGG.- TCACCCGGTT TGGCAACCCG: 
2401 agacgtgcgg: ctggactgtc: TCGAGAATGA- GCCCCAGGAC gccggggcgct 
2451 CGCAGCCGTG^ CGGGCTCTGC TGGCGAGCGC TGATGGGGGTT GCGCCAGAGTT 
2501 CAGG CTGAGG 1 GAGTGCAGAGT TGCGGCCCGC CCGCCACCCA. AGATCETCGCT 
2551 TGCGCCCTTG* CCCGGACACG* GCATCGCCCA- CGATGG CTGCT CCCGAGCCAT 
2601 GGGTCGCGGC: CCACGTAACC CAGAACGTCC" GTCCTCCG CC CGGCGAGTCC 
2651 CGGAGCCAGC CCCGCGCCCC GCCAGCGCTGT GTCCCTGAGG CCGACGACAC 
1701 CAGCAGCCTT GCCTCAGCCT" TCCCTTCCGT" CCCGGCCCCG CACTCCTCCC 
1751 CCTGCTCGAG GCTGTGTGTC . AGCACTTGGC TGGAGACTTC TTGAACTTGC 
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2S01 CSSSAi AGTG . ACTTGGGCTC CCCACTTCGC GCCGGTGTCC TCGCCCGGCG 
2S51 GATCCAGTCT TGCCGCCTCC AGCCCGATCA CCTCTCTTCC TCAGCCCGCT 
2901 GGCCCACCCC AAGACACAGT TCCCTACAGG GAGAACACCC GGAGAAGGAG 
29 51 GAGGAGGCGA AGAAAAGCAA CAGAAGCCCA GTTGCTGCTC CAGGTCCCTC 
3001 GG AC AG AG CT TTTTCCATGT GGAGACTCTC TCAATGGACG TGCCCCCTAG 
3 051 TGCTTCTTAG ACGGACTGCG GTCTCCTAAA GGTAGAGGAC ACGGGCCGGG 
3101 GACCCGGGGT TGGCTGGCGG GTGACACCGC TTCCCGCCCA ACGCAGGGCG 
3151 CCTGGGAGGA CTGGTGGAGT GGAGTGGACG TAAACATACC CTCACCCGGT 
3 201 GCACGTGCAG CGGATCCCTA GAGGGGTTAG GCATTCCAAA CCCCAGATCC 
3251 CTCTGCCTTG CCCACTGGCC TCCTTCCTCC AGCCGGTTCC TCCTCCCCAA 
3 3 01 GTTTTCGATA CATTATAAGG GCTGTTTTGG GCTTTCAAAA AAAAAAATGC 
3 3 51 AGAAATCCAT TTAAGAGTAT GGCCAGTAG A TTTT ACTAGT TCATTGCTGA 
3 401 CCAGTAAGTA CTCCAAGCCT TAGAGATCCT TGGCTATCCT TAAGAAGTAG 
34 51 GTCCATTTAG GAAGATACTA AAAGTTGGGG TTCTCCATGT GTGTTTACTG 
3 501 ACTATGCGAA TGTGTCATAG CTTACACGTG CATTCATAAA CACTATCTAT 
3 551 TTAGTTAATT GCAGGAAGGT GCATGGATTT CTTGACTGCA CAGGAGTCTT 
3 601 GGGGAAGGGG GAACAGGGTT GCCTGTGGGT CAACCTTAAA TAGTTAGGGC 
3 651 GAGGCCACAA CTTGCAAGTG GCGTCATTAG CAGTAATCTT GAGTTTAGCG 
3701 CTTACTGAAT CTACAAGTTT GATATGCTCA ACTACCAGGA AATTGTATAC 
3 751 AGCGCCTCTA AGGAAGTCAC TTGTGCATTT GTGTCTGTTA ATATGCACAT 
3801 GAGGCTGCAC TGTATAAGTT TGTCAGGGAT GCAGTGTCCG ACCAACCTAT 
3851 GGCTTCCCAG CTTCCTGACA CCCGCATTCC CAGCTAGTGT CACAAGAAAA 
3 901 GGGTACAGAC GGTCAAGCTC TTTTTAATTG GGAGTTAAGA CCAAGCCCCA 

3 951 AGTAAGAAGT CCGGCTGGGA CTTGGGGGTC CTCCATCGGC CAGCGAGCTC 

4 001 .TATGGGAGCC GAGGCGCGGG GGCGGCGGAG GACTGGGCGG GGAACGTGGG 
4 051 TGACTCACGT CGGCCCTGTC CGCAGGTCGA CCATGGTGGC CGGGACCCGC 
4101 TGTCTTCTAG TGTTGCTGCT TCCCCAGGTC CTCCTGGGCG GCGCGGCCGG 
4151 CCTCATTCCA GAGCTGGGCC GCAAGAAGTT CGCCGCGGCA TCCAGCCGAC 
4 201 CCTTGTCCCG GCCTTCGGAA GACGTCCTCA GCGAATTTGA GTTGAGGCTG 
4251 CTCAGCATGT TTGGCCTGAA GCAGAGACCC ACCCCCAGCA AGGACGTCGT 
4301 GGTGCCCCCC TATATGCTAG ATCTGTACCG CAGGCACTCA GGCCAGCCAG " 

43 51 GAGCGCCCGC CCCAGACCAC CGGCTGGAGA GGGCAGCCAG CCGCGCCAAC 
44 OL ACCGTGCGCA CGTTCCATCA CGAAGGTGAG CGGGCGGCGG GTGGCGGGGC 

44 5 L GGGGACGGCG GGCGGGCGGA GACTAGGCGG GCAGCCCGGG: CCTCCACTAG 
4501 CACAGTAGAA GGCCTTTCGG CTTCTGTACG GTCCCCTCTG TGGCCCCAGC 
4 551 CAGGGATTCC CCGCTTGTGA GTCCTCACCC TTTCCTGGCA AGTAGCCAAA 
4 601 AGACAGGCTC CTCCCCCTAG AACTGGAGGG AAATCGAGTG ATGGGGAAGA 
4 651 GGGTGAGAGA CTGACTAGCC CCTAGTCAGC ACAGCATGCG* AGATTTCCAC 
4701 AGAAGGTAGA GAGTTGGAGC TCCTTAAATC TGCTTGGAAG CTCAGATCTG 
47 5 1 TGACTTGTGT TCACGCTGTA GTTTTAAGCT AGGCAGAGCA AGGGCAGAAT 
4 801 GTTCGGAGAT AGTATTAGCA AATCAAATCC AGGGCCTCAA AGCATTCAAA 
4 851. TTTACTGTTC ATCTGGGCCT. AGTTTGAAAG- ATTTCTGAAT.* CCCTATCTAA 
49 OL TCCCCGTGGG* AGATCAATTC CACAATTCGT CATATTGTTT" CCACAATGAC" 
4 951 CTTCGATTCT TTGCTTAAAT'* CTTAAATCTC CAAGTGGAGA CAGCG CAACG* 
5001 CTTCAGATAA. AAGCCTTTCT CCCACrGCCE GCTACCTTCC TAGGCAAGGC 
5051. AATGGGGTTr TTAAACAAAT* ATATGAATAH* GATTTCCCAA- GATAGAATAA 
5X01. TGTTGTTTAT TTCAGCTGAA ATTTCCTGGA XT AG AAAGG C TGTAGAGGCC 
5151. TATTGAAGTC: TCTTGCACCG: ATGTHCEGAA- AGCAGTTAGX AAAAAATCATT 
5201. G AC CT AG CTC AATTCTGTGT: GTGCCACXT3T CAATGTGCE3T TTGACTT AAXT 
5251 GTATTCTCCA. TAGAACATCA. GTTCCEXCAA1 GTTCIAGAAG: AATXCAGATT 

53 oi. taaagttttgt ctttgccttg? ctgaggggat: aaattttaag:- tagaaatcta 

535L GGCTCTGAAA- TGATAGCCCA- ACCCCATCTC" CAGTAAGGGA* TGACTGACTC 

5401 AAACCTTGAG* AAGTCTGGGT GATAATAGGA AAAGTCCACA. AGCAGGTCAC 

54 51 agagcgcgag; ATGGATCTGT CTTGAGGCAG CCAATGGTTA TGAAGGGCAC 
5501 TGGAAATCCA TCTCTTTCAA ACTGGTGTCT* AGGGCTTTCT GGGAGCAAAG 
5551 CTTAGACCAC ATTCTGCTCC TCAAGGTTTG CCTACTGAAA G CAGGG AG AT 
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5601 TCTGGGTGTT CACCCCCATC CTTCACCCCC AGGTGATTCT. GGGCTTAGCT 

5 651 AATCTCTCCT GGTTAATATT CATTGGAAAG TTTTTATAGA TCAAAACAAA 
5701 CAAACCTACT ATCCAGCACA GGTGTTTTTC ' CCACTGCCTC TGGAGATATA 
57 51 GCAAGAAAAC CAT AT ATT C A TGTATTTCCT TATTAGTCTT TTCTAACGTG 
5301 AAAATT ATT C CTGACCTATA AAAAATGAAG GAGGTATTTT ATCTTAACTA 
5551 AGCTAAAAGA ATCGCTTAAG TCAATTGAAA CTCAAAAATC CAATTGAATG 

. 59 01 AAAGGTTCGT CAATAAAAAT CTACATTTTT CTTACTCTTC CTTTGG.AAAT 
5951 AGCTTGATAA AAACACAGAC AAAAG AAAGT CTGTGTGCTT ATTTGAAAAC 
6001 TTAGTGAGCT TCAGTTCATA AGCAAAAAAT GTAGTTTAAA AGTGATTTTT 
6051 CTCTTGTAAA ACGTGATAGA AGTTATTGAC TTGTTTAAAA TAAACTTG C A 
6101 CTAACTTTAT ACCTTGGTGC AATTAGATGT AATGTTTACT GTAAATTTCA 
6151 GGAAAACCAT TTTTTTTTTT TGGTCATGAT CAGGTACACA TGGCATTTGG 
6201 GAAGACTTTT CACATTGTTG AGTAACCTAG AGTTTGTTTG TTTGTTTGTT 
6251 TGTTTTTAAG CATTCTTGTG CCACTAGAAA AACCTTAATA AGCCATGTGT 
6301 TACTTGGTAG ACTTCTTCCT AAGTTCTAGA AAGTGGCTTA ATGCCACGAT 

6 3 51 GAGACAAAAC ATACCATAGT AGTCTTTCAA CCAGTGGCAG AGTCTTCCAG 
6401 ACAAAATCTC CTGTTGAACA TTAAGACCAT GGATTTTTAT CCAGGAGAGC 
6451 CCAGGCTTTG CTGAATCACC ACCCTCCAAC CCCACTCCAA GGTCACCGAA 
6501 GGCCTCCCCA ACTGGCTGCC ATTGAGAAAC TGTTTGAAAT TGATTGACTC 
6551 CATTGGCCCT ACAGAGACTT CTCCTTTAGT GGCAGATCAT ATACTGAAGG 
6601 ATCCAAGCTT GCTCTTCTGA CTATGAAGAG CACAGTCTIT CTTTTTCTTT 
6651 ATGGAATAAA CAAACTATGT GGCCCTGTGA CTAAAGTTTT CAAAGAGGGA 
6701 GAGATCCTGT TAG C AG AAGT GCAACTGCCC AGAAACTAGC CACAGGCTAG 
6751 GATATTCCAA AGTACAACTC TAAAGTATGG TCCATCCTAA ATTCTAGCAT 
6301 GGGGTTGAAT ACCGGCATCC AGGAATACTT CTCTCTACCT CTGGCTATTG 
63 51 CAGTGAGATT ACGAAGACCC TGGGGGGAAA AACAGTTGCT TAGTTTACAG 
6901 ATGTTCCTTG CCACAGATGT TCTCAGTATC TCTTGTTTGT CAGAGGATCC 
6951 TTTCAATCCC TCTTGACATT TCCAATCTGC TTTTGTCCTC TCTACATGTG 
7001 CCTTGTGGCA TTTCGCTTGG TCTTTAGAGA ATCCCTTTCT GGAGCTGCAG 
7051 GTTCC CTTGT AGGATCTGTG TTCAGGAGAA CAGGGACCTT GGCAGGTTAG 
7101 TGACAACTAC CAAACCCTGC TTTCCTTCCC TGCCACTTCC TTTGTTG CCT 
7151 TAAAAATTAA ACCTTAACTC. TCTGTGTCTA AACCTTTTCT TCTTCCTCTT 
72.01 TGTCATTTAC TTT ATTTATT ' TGTCATGTAC TTTATCCTGT. AGAAAATCAC 
7251 AGTGTGGCCC: AAAGCCCCTT GAATCTTGTT GCAGCGGTGA GATGCAGCTG 
73 01 CTGATCTGGA AT AG CCTTAG* GCIGTGTGTT: TGATCACAAT* GCTTTCTGTC 

73 51 CAAAAGTGTG" CAAATCCTCC AAGCTTAATG ATAACTTTTG AAATGAAACT 

74 01 CACCCTACTT TAGGGCAAAC* AAGTAGCCAC AG AG AG C AG G" ATCTAAACAA 
74 51 GGTCTGGTGr . CCC ATTTGG C TGTGTCCCTTL CAATTTTCTG TTCATTTAGC 
7501 TCTGTCTGCA TCTAAAGGGT GCTGGGCAAT* AAGTTTTGAT' CTTCAGGGCA 
7551. AAACTCAATC: TTCAGTTACC ATGGTATCAGL GTACCAATTC CTAGTGATTT* 
T6 01 GTGCTATGGC: TTAGGATTTG" ATTTCTCTCC TACATTAGGT AATATCTTTC 
7651 AATGGCTAGA. ACTTGGGCAT" TGCAGTACAC1 TCAAGTTAAC. AGTTCXGTGA 
7701 CCTAAGGAAG* TCACATAACC TCTCTGAATT: CTCTACTGTT TCATTCACAA 
7751. AATGGAGAAA. ATCATGGCTCr TTTCTTAATG* TGCGAATTCA TAGAAAGGTG" 
"SOL ATG acaccag: ATTTGGCAGA^ AGGAAGGAAA. GGAAGGAAGGI AAGAAAGAAA. 
7851. GAAAGAAAGA AAGAAAGAAA. GAAAGAAAGA AAGAAAGAAA GGAAGGAAGG" 
7901 GAGAGAGAGA GAAGGGAAGG 1 . GAAAGGGAAA. GGGAAAGGAA AGAAAAGAAA 
79 51. GGAAGGAAGG AAAGGAAGGAi AGGAAGGAAA^ GAAGGAAGGA^ AGGAAAAGAA1 
8 0 0 11 AGAGAAGAAA-. GCATTCAGCA, TATGAACTAA, TGTTXCCIGG: TGACTTTTTAl 
80531" TATCATATCCT TUGTICTAG^ AAGTGGCCCTT AGCCATAXCT TTTGGGTTATr 
3 10 1 TTTGAGGTAC^ AGGATAATCAL ACATAGTGTAi GAACATTAAAi TCTGGGTTTTT 
3 151. GTTTCTAGAA- G AGGCTAGAA- TGGCATGGCET GTCCCXCXT& CTCCTCETTC: 
3201 AG G CAGTATGT GCAGCCACCA TTCTCTCTGT* AAGATCTAGG" AGGCTGACAC 
3251 TCAGGTTGGA GACAGGTCAC AATCCTGAAA TCACTTAGCA AGTTCAGCTG* 

. 33 01 ATTCAACAAG* GG ATATTTACT AG AG AATTAA CAGCTATTCC AGCTTCCAAA 
33 51 AAGTGTACAT TACCTACTCT GTATTTTCAG AACCC CAGGT TTGCTGTGAT 
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3401 AATTTGGTAG AAGCCTTTTC CTGTAATTTT CTTTATTTAA AAGA^AT""™ 

3 451 CATTTTCCAC CCTCAAGAAG AGGTTGAAAC TTGTCCCTTG AAGTAGAAGA 

8S01 GGTGTTGTGT GTCCTGACCC TGAGGAAGTT GGCCTTGTTG AGGTCTTCTG 

3 551 T AAATT CTTG AATTGTCTGT ATAATTTCAA TGAATAGTCA TGTT^GATAC 

3601 CTTGGTATAA AGGATGGGAT AAGATCTTTC AAGGCTTAGG CTGATGGAAA 

3651 CGCTGCTGAA AG ACT AG AG A TTGCTCTTTC CTTTGC-CATC ■ TGTCTTGGGT 

3 701 AGTAATATTG TTCTCTGTGA AGGCCCACTT ATTCTGTCTT GAAAATTCTT 

3751 CTTACCTCCA GAGTGATAGG CCACAGGGAG TACTGTTTCT ATGTTTG C AG 

8801 TTGAAAGATG ACAATTTCAT ATGGTCCAAA CTTGGCTTTA TTTCTTGGTG 

3851 AG AT ATT ATT CTGTTACTTC AATGACCTGT CTCCATTATT TATCTTGAGG 

3901 CTCACCTCTT CCCTTTTGTT GACTGTTGTG CAATTTGTGG AAGGCCCTGG 

8951 GTAGTCAGCC TTTATACTCT GTCTGTACAG GAAATAAAGT GCATGTCACC 

9001 ATGCCAAAGT CAGGAGATGC CGGTGTGATT AGGGTCCACG GGATTTTGCT 

9051 ACTGTTTTTA TTTCTATCGA TGAATTGCCT TAGGCAGAAA CATTAAGGGA 

9101 CACCAGAATG GTGATGAAAG GCTTTTTATA ACAGAAGCTA AATGCAGTCC 

9151 TTCATACTTC ATGGAATGCC CCTGTCCTAA AGTACCATTA ACCGATAGTG 

9201 GAGTCAGAAC AT AAATGG CT CCCCAAAGGT ATCACCAAGA ACTTTTGGCA 

9251 AACAGATGCA AGAGGATTAT GAAGAATCGC AGCTTGGTCT GGTAATCTTC 

9301 CTGTTGCAAA GAGAAGAGCT TTAGAAGACC CCCCTTGAGT CCCTGGCTGG 

93 51 CTTAACATAG CATGAACCCT CATGTGTTGG CCAACATTAA GGCTTTTTCT 

94 01 ATAAAAGTCT CCTCCTTCAT CAGTATACGC TCGAGTATGA AAAGCATCCT 
9451 TTTAAACCTT GACTCTGTGT GGTCCAGAAA CAGCAGCATC CCTTGCTTAA 
9 SOL GAGCTTAATG GAGATGCAGG AGTGCAGGCC TCTTCCCAGA CCGGCTGATG 
9551 TGCAGGTCAA AGTCTAAGCA CTGCTGGATC AACACAGAAG TTATTCCGAA 
9 601 TGAGGATGAG ATGGATACGA GAGAACAGGA AGTAGGAAGG GATTTCTTTA 
9 651 TCGTGAATTG CTACAGCAGC CTAATGTCAC CCCATACCCT TCTGAAGAAC 
9701 TATGTCCCTG TGGATGCCTT TGTCTCTAGA GTTCTGAGCA AAATG GTAGG 
9751 GTGTG CTTTG CAAAATGTCA TCATTGATGT TGAATTTCAA AGTCTTTAAT 
98 01 . TAAGGGG CTG AAATCTGTAT ATTGAGATTT GTAAATCATC TAAATTGTAG 
9851 AGTAATGTTT GCACAGGCTG CTTAAGGGAT TGACATTAAA GCTCGTTTTC 
9901 TTAGTTAAGA AATACAGTCA TTTCCTCAAC TCCTCAGTCA TTAGCTGTCT * 
9951 ACTAAGTACA GTGCTGACTT ' TTTTAAAATT AAAGT CTGTG AATTCCAAAG 

10001 AAGTGTTTCA CTATTTCCTC CATTATTATA GCTACCTAGA AG CTATGTTC 

10051 ATATATTGGA TTAAAAACGT AGCAATTACA AAGTTAATGT GGCCATATAG 

10101 AAAAGGGAAA AGAAACTCCG CTTTCACTTT AATATATATA TGTGTGTGTG 

10151 TATATCATAT ATATACATGT TGTGTGTGTA TATATATATA TATATATATA 

10201 TATATATATA TATATATATA TATATATATA TGTTGTGTTA AGCAGTAAAC 

10251 TCAGGCCATG GACAG AGGGG CAGACATTGT ATCTCTAGGC CTGACATTTT 

103 01 TAATTTCTGG TTGCAGGTTT TTATGTAGTT TAACTTAAAC CATGCACTGA 

103 51 AGTTTTAAAT GCTCGTAAGG AATTAAGTTA CCATTGGCTC TCTTACCAAA 

104 03- TGCGTTTCTT" TTTTCTCTCC ACCCTGATCA AACTAGAAGC GGTGGAGGAA 
104 51- CTTCCAGAGA TGAGTGGGAA AACGGCCCGG. CGCTTCTTCT* TCAATTTAAG 
10501. TTCTGTCCCC* AGTGACGAGT TTCTCACATC TGCAGAACTC CAGATCTTCC 

10551 gggaacagat acaggaagcz ttgggaaaca gtagtttcca gcaccgaatt 

10602. aatatttatg aaattataaa gcctgcagca gccaacttg a aatttcctgt" 

10651. gaccagacta ttggacacca ggttagtgaa tcagaacaca. agtcagtggg 

10701 agagcttcga cgtcacccca g ctgtg atgc ggtgg accac acaggg ac ac 

1075]_ accaaccatg: ggtttgtggt: ggaagtggccv catttagagg: agaacccagg^ 

loeoi: tgtctccaag: agacatgtga. ggattagcag:- GxerrrGCAcr caagatgaact 

10851. ACAGCTGGTC ACAGATAAGG-" CCATTG CTAGT TGACTTTTGG 1 ACATGATGGA. 

10901 AAAGGACATC CGCTCCACAA ACGAGAAAAC CGTCAAGCCA. AACACAAACA 

10951- GCGGAAGCGC" CTCAAGTCCA. GCTG CAAGAGI ACACCCTTTG:' TATGTGGACE 

11001 TCAGTGATGT GGGGTGG AAT GACTGGATCC? TGGCACCTCC GGGCTATCAT 

110 51 GCCTTTTACT GCCATGGGGA GTGTCCTTTT CCCCTTGCTG ACCACCTGAA 

11101 CTCCACTAAC CATGCCATAG TGCAGACTCT" GGTGAACTCT GTGAATTCCA. 

11151 AAATCCCTAA GGCATGCTGT GTCCCCACAG AGCTCAGCGC AATCTCCATG 
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11201 TTGTACCTAG ATG AAAATG X AAAGGTTGTG CTAAAAAATT ATCAGGACAT 
11251 GGTTGTGGAG GGCTGCGGGT GTCGTTAGCA CAGCAAGAAT AAATAAATAA 
11301 ATATATATAT TTTAGAAACA GAAAAAACCC TACTCCCCCT GCCTCCCCCC 

113 51 CAAAAAAACC AGCTGACACT TTAATATTTC CAATGAAGAC TTTATTTATG 
11401 GAATGGAATG AAAAAAACAC AGCTATTTTG AAAATATATT TATATCGTAC 

114 51 GAAAAGAAGT TGGGAAAACA. AATATTTTAA TCAGAGAATT ATTCCTTAAA 
11501 GATTTAAAAT GTATTTAGTT GTACATTTTA TATGGGTTCA ACTCCAGCAC 
11551 ATGAAGTATA AGGTCAGAGT TATTTTGTAT TTATTT ACT A TAATAACCAC 
11601 TTTTTAGGGA AAAAAGATAG TTAATTGTAT TTATATGTAA TCAGAAG AAA 
11651 TATCGGGTTT GTATATAAAT TTTCCAAAAA AGGAAATTTG TAGTTTGTTT 
11701 TTCAGTTGTG TGTATTTAAG ATGCAAAGTC TACATGGAAG GTGCTGAGCA 
11751 AAGTGCTTGC ACCACTTGCT GTCTGTTTCT TGCAGCACTA CTGTTAAAGT 
11801 TCACAAGTTC AAGTCCAAAA AAAAAAAAAA AGGATAATCT. ACTTTGCTGA 
11351 CTTTCAAGAT TATATTCTTC AATTCTCAGG AATGTTG CAG AGTGGTTGTC 
11901 CAATCCGTGA GAACTTTCAT TCTTATTAGG GGGATATTTG GATAAGAACC 
11951 AGACATTACT GATCTGATAG AAAACGTCTC GCCACCCTCC CTGCAGCAAG 
12001 AACAAAGCAG GACCAGTGGG AATAATTACC AAAACTGTGA CTATGTCAGG 
12051 AAAGTGAGTG AATGGCTCTT GTTCTTTCTT AAGCCTATAA TCCTTCCAGG 
12101 GGGCTGATCT GGCCAAAGTA CTAAATAAAA TATAATATTT CTTCTTTATT 
12151 AACATTGTAG TCATATATGT GTACAATTGA TTATCTTGTG GGCCCTCATA 
12201 AAGAAGCAGA AATTGGCTTG TATTTTGTGT TTACCCTATC AGCAATCTCT 
12251 CTATTCTCCA AAGCACCCAA TTTTCTACAT TTGCCTGACA CGCAGCAAAA 
123 01 TTGAGCATAT GTTTCCTGCC TGCACCCTGT CTCTGACCTG 7CAGCTTGCT 

123 51 TTTCTTTCCA G G AT ATGTGT TTGAACATAT TTCTCCAAAT GTTAAACCCA 
12401 TTTCAGATAA TAAATATCAA AATTCTGGCA TTTTCATCCC TATAAAAACC 

124 51 CTAAACCCCG TGAGAGCAAA TGGTTTGTTT GTGTTTG CAG TGTCTACCTG 
12501 TGTTTG C ATT TTCATTTCTT GGGTGAATGA TGACAAGGTT GGGGTGGGGA 
12551 CATGACTTAA ATGGTTGGAG AATTCTAAGC AAACCCCAGT TGGACCAAAG 
12601 GACTTACCAA TGAGTTAGTA GTTTTCATAA GGGGGCGGGG GGAGTGAGAG 
12651 AAAGCCAATG CCTAAATCAA. AGCAAAGTTT GCAGAACCCA AGGTAAAGTT 
12701 CCAGAG ATGA TATATCATAC AACAGAGGCC ATAGTGTAAA AAAATTAAAG* 
12751 AATGTCTGAT CAGCGTCTCA GCACATCTAC CAATTGGCCA GATGCTCAAA 
12801 CAGAGTGAAG TCAGATGAGG" TTCTGGAAAG TGAGTCCTCT ATGATGGCAG 
12851 AGCTTTGGTG CTCAGGTTGG. AAGCAAAACC TAGGGAGGGA GGGCTTTGTG 
12901 GCTGTTTGCA GATTGGGGAA TCCAGTGCTA GTTCCTGGCA GGGTTTCAGG 
129 5 1 TCAGTTTCCG G AGTGTGTGT. CCTGTAGCCC TCCGTCATGG TTGAAGCCCA 
13 003. GGTCTCACCT CCTCTCCTGA CCCGTGCCTT AGAACTGACT* TGGAAAGCGG 
13051 TGTGCTTACA GCAAGACAGA CTGTTATAAT TAAATTCTTC CCAAGGACCT 
13101 CCGTG CAATG ACCCCAAGCA. CACTTACCTT CGGAAACCTT AAGGTTCTGA 
13151 AGATCTTGTT* TTAAATGACT' ACCCTGGTTA GCTTTTGATG" TGTTCCTTAT* 
13201 CCC TTTAGTT: GTTGCACAGG TAGAAACGAT" TAGACCCAAC* TATGGGTAGC 
13 251 CTTGTCCTCC TGGTCCTTCA GTCATTCTCT AATGTCTCTT.' GCTTGCCATG 

' 13301 GGCACTGTAA. CAAACTGCAA TCTTAACATCT'TTATAAAATG" AATGAACCAC 
133 51 ATATTTACAT" CTCCAAGTCC TCCAGATGGG AGTGCGATCA TTCCATAAGG* 
13403- ATCCCACCTT CTGGCAGGTC TAXCCAGTAC ATATCTTATC CTTCATTGGT: 
13 4 51 CTTGATTTTCT TTGGCTAAAA. TTACTTGTAGT CACAGCAGGC1 CCCATGTGAC 
13501 ATATAGGTAT ATACATACAT" GTATGTGCATT ATAGTGTGTA CATGTTCTAA 
13 551. TTTATACATA*. GCTATGTGAA. G ATTATGTTA1 CATATGTAG A_ TGGTCGCACTT 
13601 TCIGATXTCCT ATTTAGGTTCT AGAGAGAGACT GTCXCAGTAA.- ATGGAGCTATI 
13651. GTCATTGGTA. TAICCCCGAG TGGTXCAGGTT GTTCECTCTA. TTXXTTTAAG: 
13 70 1. ATGGAGAACA^ CECATCTGTAl CTATCGAAAAi. CTGAGCCAAA. TCACCTAGCA 

i3753_ aatttctagt: cactgccttc^ cxgttaagat: ACTGATTCAC TGGGTGCTGA- 

12801 catg ctgago cctgcctactt tttgcatgaa ggacaaggaa, gagagcttgc 

138 51 agttaagaax ggtatatgtg gggctagggg gcggcgtata. gactggcata 

139 01 tatgtgaagg aaggtcacaa acagcctgca ctaatttccc- ttttctggtt 
139 51 ttatgtcttg gcaggggaaa ggacaggtag ggtggggttg agggggaggg 
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14 001 CACACACATC TACTTGGATA 

14 051 ACCATATCTT AAAGCCTTAT 

14101 GCTCCCCGCC CTACCAACTT 

14151 CAAAGTTCTA AATAAAACTT 

14 201 GGAATGCCTC CAGGAAAGCA 

14 251 GGAGTCCTGC CACCCTATGA 

14301 GATCCTGAAT TTCTAATGAG 

14351 AGGACTGCAG TTTGAGTAGC 

14401 ACAGAAGCTG GCTCAAGTAC 

144 51 TGAGAAGAAA CAGAAGGCAA 

14501 CCTGCAGTTT TGTTTTTTGT 

14551 AACAAGATGA GTGGCAATCT 

14601 ACTGGGTTGC TTATCTTGTA 

14 651 CTG CAGTC AG TTTAGTCAAA 

14701 AGCAGGTGCA AGTCCTTAGA 

14751 TCAGTGTGCT GAGCTGCTCC 

148 01 AAGGGATCTC TTTGAAGGCA 

14851 GTTTATACTA AAATGATGCT 

14901 ACTGTATAAG TTCATTGAAC 

14951 TATGGGAGGT TTGTTCTAAT 

15001 GGAATAAAGT GCTTATGTGA 

15051 TCCCTCtGCA AAGAGGGTCA 

15101 ATGAATGCCA CTTGTTAGCA 



AATTGCATCT CCTCTTTCCT TCACCCCGCC 
GACATCCTCT AGGGCAGAAT TTTCTCACCA 
CAAAGTGAAC TTCTAACTAA CTTGAGGGGC 
GTTAGAGTTT AGCGGGCACC TCAGTCATCA 
AAAAGCTTGA TGTGTGTACA GCCACGTGGT 
TTCCTGTCCC AGTGGT CGTG TGGGGCCTGA 
GTCCCAGTAC GCCCTGACTC ACTGTGCCAG 
AAGGTTGTGT GACTGTCTTC GATCATGGCT 
AGCCCTTCGT GTGTAAAAGC CATGTGTAAA 
AGCTGCGTTG CATGGCATCT GAATCAGTGC 
TTTTTTTTTT TCAAAGACAT TCtTTTTCCC 
TATGTTCTAG CCACTCTTAG ACATGAAAAC 
AAATCTGCTC TGCTTGdTG CTTGGGCACG 
TGCGTGTCAG TACATCTATA TGTATGAGGG 
AATGTACTTT AAAAAACTTG AACACTTAAG 
TGTGTGATGT TAGGCCAAGC ACCTGAGTTA 
GAGGGTAGAT GTCGTATGGT TGAAGCATTT 
TGACTTTTTT TCTAAGTTAT AAGACAGTAC 
: CTAGAGGGTG GCATAGGACT CCAAATCTGG 
GGAAGTTCGA ATCTTTTTTG CAGTTGGCTT 
ATGGGCTTAA GCTAGGGAAA AAAATGGGTT 
GCACAGAAAT AACTTCCTGG CTTTGCTTGC 
GATGCCCTGT GGGGATCCGA ATTC 
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1 GAATTCGCTA GGTAGACCAG GCTGGCCCAG AACACCTAGA GATCATCTGG 
51 CTGCCTCTGT CTCTTGAGTT CTGGGGCTAA AGCATGCACC ACTCTACCTG 
101 GCTAGTTTGT ATCCATCTAA ATTGGGGAAG AAAGAAGTAC AGCTGTCCCC 
151 AGAGATAACA GCTGGGTTTT CCCATCAAAC ACCTAGAAAT CCATTTTAGA 
2 01 TTCTAAATAG GGTTTGTCAG GT AG CTTAAT TAGAACTTTC AGACTGGGTT 

2 51 TCACAGACTG GTTGGGCCAA AGGTCACTTT ATTGTCTGGG TTTCAGCAAA 

3 01 ATGAGACAAT AGCTGTTATT CAAACAACAT TTGGGTAAGG AAGAAAAATG 

3 51 AACAAACACC ACTCTCCCTC CCCCCGCTCC GTGCCTCCAA ATCCATTAAA 

4 01 GGCAAAGCTG CACCCCTAAG GACAACGAAT CGCTGCTGTT TGTGAGTTT A 
4 51 AATATTAAGG AACACATTGT GTTAATGATT GGAGCAGCAG TGATTGATGT 
501 AGTGGCATTG GTGAGCACTG AATCCGTCCT TCAA.CCTGCT ATGGGAGCAC 
551 AGAGCCTGAT GCCCCAGGAG TAATGTAATA GAGTAATGTA ATGTAATGGA 
601 GTTTTAATTT TGTGTTGTTG TTTTAAATAA TTAATTGTAA TTTTGGCTGT 
6 51 GTTAGAAGCT GTGGGTACGT TTCTCAGTCA TCTTTTCGGT CTGGTGTTAT 
701 TGCCATACCT TGATTAATCG GAGATTAAAA GAGAAGGTGT ACTTAGAAAC 
751 GATTTCAAAT GAAAGAAGGT ATGTTTCCAA TGTGACTTCA CTAAAGTGAC 
8 01 AGTGACGCAG GG AATCAATC GTCTTCTAAT AGAAAGGGCT CATGGAGACC 

8 51 TGAGCTGAAT CTTTCTGTTC TGGATGAGAG AGGTGGTACC CATTGGAATG 

9 01 AAAGGACTTA GTCAGGGGCA ATACAGTGTG CTCCAAGGCT GGGGATGGTC 
951 AGGATGTTGT GCTCAGCCTC TAACACTCCT TCCAACCTGA CATTCCTTCT 

1.001 CACCCTTTGT CTCTGGCCAG TAGAATACAG GAACTCGTTC CTGTTTTTTT 
1051 TTTTTTAAAT TCTGAAGGTG TGTAAGTACA AAGGTCAGAT GAGCGGCCCT 
1101 AGGTCAAGAC TGCTTTGTGG TGACAAGGGA GTATAACACC CACCCCAGAA 
1151 ACCAAGAACC GGAAATTGCT ATCTTCCAGC CCTTTGAGAG CTACCTGAAG 
1201 CTCTGGGCTG CTGGCCTCAC CCCTTCCCTG CAGCTTTCCC TTTAGCAGAG 
1251 GCTGTGATTT CCTTCAGCGC TTGGGCAAAT ACTCTTAGCC TGGCTCACCT 
13 01 TCCCCATCCT CGTTTGTAAA AAGAAAGATG AAGCTGATAG TTCCTTCCCA 

13 51 GCTCCATCAG AGGCAGGGTG TGAAATTAGC TCCTGTTTGG GAAGGTTTAA 

14 01 AAGCCGGCCA CATTCCACCT CCCAGCTAGC ATGATTACCA ACTCTTGTTT 
14 5 L CTTACTGTTG TTATGAAAGA CTCAATTCCT CATCTCCCTT TCCCTTCTTT 
1501 TAAAAAGGGG CCAAAGGGCA CTTTGTTTTT TTCTCTACAT GGCCTAAAAG 
155L GCACTGTGTT ACCTTCCTGG AAGGTCCCAA ACAAACAAAC AAACAAACAA 
1601 AATAACCATC TGGCAGTTAA GAAGGCTTCA GAGATATAAA TAGGATTTTC 
is 5 l taattgtctt ACAAGGCCTA GGCTGTTTGC CTGCCAAGTG CCTGCAAACT 
1.701 ACCTCTGTGC ACTTGAAATG TTAGACCTGG* GGGATCGATG GAGGGCACCC 
L75L AGTTTAAGGG GGGTTGGTGC AATTCTCAAA TGTCCACAAG. AAACATCTCA 
1801 CAAAAACTTT TTTGGGGGGA AAGTCACCTC CTAATAGTTG AAGAGGTATC 
1851 TCCTTCGGGC ACACAGCCCT GCTCACAGCC TGTTTCAACG TTTGGGAATC 
1901 CTTTAACAGT TTACGGAAGG CCACCCTTTA AACCAATCCA ACAGCTCCCT* 
1951 TCT CCATAAC: CTGATTTTAG AGGTGTTTCA TTATCTCTAA TTACTCGGGG 
2001 TAAATGGTGA TTACTCAGTG" TTTTAATCAT CAGTTTGGGC AGCAGTTATT 
2051. CTAAACTCAG. GGAAGCCCAG". ACTCCCATGC GTATTTTTGGT AAGGTACAGA 
2101. GACTAGTTGCT TGCATGCTTT CTAGTACCTC TTGCATGTGG" TCCCCAGGTG. 
2151. AGCCCCGGCTL GCTTCCCGAG. CTGGAGGCAT CGGTCCCAGC CAAGGTGGCA 
2201. ACTGAGGG CTL GGGGAGCTGT GCAATCTTCC GGACCCGGCC TTGCCAGGCG 
2251 AGGCGAGGCC CCGTGGCTGG" ATGGGAGGAT. GTGGGCGGGG CTCCCCATCC 
2301 GAGAAGGGGA. GGCGATTAAG' GGAGGAGGGA AGAAGGGAGC GGCCGCTGGG. 
23 si gggaaagact: GGGGAGGAAG* GGAAGAAAGA. GAGGGAGGGA. AAAGAGAAGG, 
2.40 H AAGGAGTAGA. TGTGAGAGG<X TGGTGCTGAG; GGTGGGAAGG; CAAGAGCGCG; 
Z451. AGGCCTGGCCT CGGAAGCTAG" GTGAGTTCGGT CATCCGAGCT GAGAGACCCC 
2501. AGCCTAAGACT GCCEGCGCTG" CAACCCAGCC TGAGTATCTG: GTCTCCGTCC 
2551 CXGATGGGAT TCTCGTCTAA ACCGTCTTGG: AGCCTGCAGC GATCCAGTCT 
2601. CTGGCCCTCG". ACCAGGTTCA TTGCAGCTTT CTAGAGGTCC CCAGAAGCAG 
2651 CTGCTGGCGA GCCCGCTTCT GCAGGAACCA ATGGTGAGCA GGGCAACCTG 
2701 GAGAGGGGCG CTATTCTGAG GATTCGAGGT G C ACCCGTAG" TAGAAGCTGG 
2751 GGATGGGGCT CAGGCTGTAA CCGAGGCAAA AGTTGGCCTA TTCCTCCTTC 
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fITfT™? AGTGTTGGAG GTGGGATGAT GGAGGCTAAA AGGCACCTCC 

*™™ GTTA ^GCGTCTAT CAACCTACTT TAGGGAGGTG CGGGCCAGGA 
2901 GAGuCGGGAA GGACAGAAGG CCTTGGAAGA GAGGTCATTG GGAAGAACTG 

TGGGGTTTGG TGGGTTTGCT TCCACTTAGA CTATAAGAGT GGGAGAGGAG 
;°°r ^^1^51 CTAAGTTTCA ACACCAGTGG GGGACTGAGG ACTGCTTCAT 
1°*; * AG f£2 AGAG AACCTAGCCA GAGCTAGCTT TGCAAAAGAG GCTGTAGTCC 

SS£2£S?? TAAAGCGCGA CCCGGGATAG AGAGGCTTCC TTGAGCGGGG 
3151 TGTCACCTAA TCTTGTCCCC AACGCACCCC CTCCCAGCCC CTGAGAGCTA 
3201 GCGAACTGTA GGTACACAAC TCGCTCCCAT CTCCAGGAGC TATTTTCTTA 
32S1 GACATGGGCA CCCATGATTC TGCCTTCTGG TACTCTCCCC TCCCTGGGAA 

A ^ GTGTAA GGT *CCGACG GAACCGTGGC CAGGATGCCG AAAGGCTACC 
TTCTGCCATG CTGTGTCTGT GCGGACATGC CAGCAGGGCT 
3401 aATGAGGAGC TTGCGATACT CCAAAGGGTT CGGGAATTGC GGGGTCCTTA 
Hn, ^ G ^ AG « GG AGTTGGGCCC CTTTTACTCA GAAGGTTTCC GCCACGG CTT 
3501 TGGTTGATAG TTT TTTT AGT ATCCTGGTTT ATGAACTGAA GGTTTTGTGA 

GATGTTGAAT CACTAGCAGG GTCATATTTG GCAAACCGAG GCTACTATTA 

3 601 AATTTTGGTT TTAGAAGAAG ATTCTGGGGA GAAAGTGAAG GGTAACTGCC 
3651 TCCAGGAGCT GTATCAACCC CATTAAGAAA AAAAAAAATA CCAGGAGATG 
3701 AAAATTTACT TTGATCTGTA TTTTTTAATT AAAAAAAATC AGGGAAGAAA 
3751 GGAGTGATTA GAAAGGGATC CTGAGCGTCG GCGGTTCCAC GGTGCCCTCG 
3801 CTCCGCGTGC GCCAGTCGCT AGCATATCGC CATCTCTTTC CCCCTTAAAA 
3851 GCAAATAAAC AAATCAACAA TAAGCCCTTT GCCCTTTCCA GCGCTTTCCC 
3901 AGTTATTCCC AGCGGCGACG CGTGTCGGGG AATAGAGAAA TCGTCTCAGA 
39S1 AAGCTGCGCT GATGGTGGTG AGAGCGGACT GTCGCTCAGG GGCGCCCGCG 
4001 GTCTCTGCAC CCAGGGCAGC AGTGTGGGAT GGCGCTGGGC AGCCACCGCC 
4051 GCCAGGAAGG ACGTGACTCT CCATCCTTTA CACTTCTTTC TCAAAGGTTT' 
4101 CCCGAAAGTG CCCCCCGCCT CGAAAACTGG GGCCGGTGCG GGGGGGGGGA 
4151 GAGGTTAGGT TGAAAACCAG CTGGACACGT CGAGTTCCTA AGTGAGGCAA 
4201 AGAGGCGGGG TGGAGCGGGC TCTGGAGCGG GGGAGTCCTG GGACTCGGTC 
4251 CTCGGATGGA CCCCGTGCAA AGACCTGTTG GAACAAGAGT TGCGCTTCCG 
4301 AGGTTAGAAC AGGCCAGGCA TCTTAGGATA GTCAGGTCAC CCCCCCCCCC' 
4351 AACCCCACCC GAGTTGTGTT" GGTGAATTTC TTGGAGGAAT CTTAGCCGCG 
4401 ATTCTGTAGC TGGTGCAAAA GGAGGAAAGG GGTGGGGGAA GGAAGTGGCT 
4451 GTGCGGGGGT GGCGGTGGGC GTGGAGGTGG TTTAAAAAGT AAGCCAAGCC 
4501 AG AGG GAG AG GTCGAGTGCA GGCCGAAAGC TGTTCTCGGG" TTTGTAGACG 
4551 CTTGGGATCG CGCTTGGGGT; CTCCTTTCGT GCCGGGTAGG* AGTTGTAAAG 
4601 CCTTTGCAAC TCTGAGATCG"- TAAAAAAAAT GTGATGCGCT. CTTTCTTTGG 

4 651 CGACGCCTGT TTTGGAATCT. GTCCGGAGTT AGAAGCTCAG' ACGTCCACCC 
4701 CCCACCCCCC GCCCACCCCC TCTGCCTTGA ATGGCACCGC CGACCGGTTT 
4-751 CTGAAGGATC TGCTTGGCTG* GAGCGGACGC" TGAGGTTGGC AGACACGGTG 
4801 TG GGGA CECT* GGCGGGGCTA- CTAGACAGTA CTTCAGAAGC CGCTCCTTCT 
4851 AACETTCCCA CACCGCTCAA ACCCCGACAC CCCCGCGGCG GACTGAGTTG 
4902_ GCGACGGGGT CAGAGTCTTC; TGGCTGAAAG. TTAGATCCGC TAGGGGTCGG" 
4951. CTGCCTGTCG- CTAGAAGCAT: TATTTGGCCr CTCGGAGACC" CGTGTGGAGG". 
5001 AAGTGCTGGA. GTGTGCGAGX. GX.GTTTGCGT GTGTGTGTGT GTGTGTGTGT: 
50 5 L GTGTG TGTGT: GTGTGTGTGT" GTGCGCGCGC CCTTGGAGGG TCCCTATGCG 
5101 CTTTCCTTTTT CATGGAACGC TGTCGTGAGG CTTTGGTAAA CTGTCTTXTC 
5151. GGTTCGTCTC TCGGCTGCACT TTAAGCTTTGT TCGGCGCTGT AAAGAGACGC 

52,01: gtcetcaagt; gcaccctgat: cctcaggcttt cagataaccct gtccccgaact 

52511 ctggccagat: gcattgcacr gcgcgccgca ggtagagacgt tgccccacgit 

53 01. ccccegcgtc. cagcgactac gaccgagagc cgcgccagtg! tggtgtcccg" 

53 5x ccgagacttcr ctcagagcagt gcggggacaa- ctcccagacg: gctggggctc 

5401. cacctgcggg" cgcggaggttt ggcctcgctc* gcaggggctg* gacccagccg 

5451 gggtgggagg atggaggagg" ggcgggcggg ctcttcggtg. agtggggcgg 

5501 ggcctctggg" tccacgtgact tcctagggsc tggaagaaaa acagagcctg 

5=51 tctgctccag agtctcatta tatcaaatat cattttagga gccattccgt 
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5601 AGTGCCATTC GGAGCGACGC ACTGCCGCAG CTTCTCTGAG CCTTTCCAGC 
5651 A AGT TTGTTC AAGATTGGCT CCCAAGAATC ATGGACTGTT ATTATGCCTT 
5701 GTTTTCTCTC AGTGAGTAGA CACCTCTTCT TTCCCTTCTT GGGATTTCAC 
5751 TCTGTCCTCC CATCCCTGAC CACTGTCTGT CCCTCCCGTC GGACTTCCAT 
5801 TTCAGTGCCC CGCGCCCTAC TCTCAGGCAG CGCTATGGTT CTCTTTCTGG 
5851 TCCCTGCAAG GCCAGACACT CGAAATGTAC GGGCTCCTTT TAAAGCGCTC 
5901 CCACTGTTTT CTCTGATCCG CTG CGTTGCA AGAAAGAGGG AGCGCGAGGG 
d951 ACCAAATAGA TGAAAGGTCC TCAGGTTGGG GCTGTCCCTT GAAGGGCTAA 
6001 CCACTCCCTT ACCAGTCCCG ATATATCCAC TAGCCTGGGA AGGCCAGTTC 
6051 CTTGCCTCAT AAAAAAAAAA AAAAAAACAA AAAACAAACA GTCGTTTGGG 
6101 AACAAGACTC TTTAGTGAGC ATTTTCAACG CAGCGACCAC AATGAAATAA 
6151 ATCACAAAGT CACTGGGGCA GCCCCTTGAC TCCTTTTCCC AGTCACTGGA 
6201 CCTTGCTGCC CGGTCCAAGC CCTGCCGGCA CAGCTCTGTT CTCCCCTCCT 
6251 CCTGTTCTTA ACCAGCTGGA AGTTGTGGAA ATTGGGCTGG AGGGCGGAGG 
6301 AAGGGCGGGG GTGGGGGGGT GGAGAAGGTG GGGGGGGGGG AGGCTGAAGG 

63 51 TCCGAAGTGA AGAGCGATGG CATTTTAATT CTCCCTCCNC CTCCCCCCTT 
6401 TACCTCCTCA ATGTTAACTG* TTTATCCTTG AAGAAGCCAC G CTG AG AT C A 

64 51 TGG CTCAG AT AGCCGTTGGG ACAGGATGGA GGCTATCTTA TTTGGGGTTA 
6501 TTTGAGTGTA AACAAGTTAG ACCAAGTAAT TACAGGGCGA TTCTTACTTT 
6551 CGGGCCGTGC ATGGCTGCAG CTGGTGTGTG TGTGTGTAGG GTGTGAGGGA 
6601 GAAAACACAA ACTTGATCTT TCGGACCTGT TTTACATCTT GACCGTCGGT 
6651 TGCTACCCCT ATATGCATAT GCAGAGACAT CTCTATTTCT CGCTATTGAT 
6701 CGGTGTTTAT TTATTCTTTA ACCTTCCACC* CCAACCCCCT CCCCAGAGAC 
675 .1 ACCATGATTC CTGGTAACCG AATGCTGATG GTCGTTTTAT TATGCCAAGT 

68 01 CCTGCTAGGA GGCGCG AG CC ATGCTAGTTT GATACCTGAG ACCGGGAAGA 
S851 AAAAAGTCGC CGAGATTCAG GG CCACGCGG GAGGACGCCG CTCAGGGCAG 
6901 AG CCATG AGC TCCTGCGGGA CTTCGAGGCG ACACTTCTAC AG ATGTTTGG 

69 51 GCTGCGCCGC CGTCCGCAGC CTAGCAAGAG CGCCGTCATT CCGGATTACA 
7001 TGAGGGATCT TTACCGGCTC CAGTCTGGGG AGGAGGAGGA GGAAGAGCAG 
70 5 L AGCCAGGGAA CCGGGCTTGA GTACCCGGAG CGTCCCGCCA GCCGAGCCAA 
7101 CACTGTGAGG AGTTTCCATC ACGAAGGTCA GTTTCTGCTC TT AGTCCTGG <■ 
7151 CGGTGTAGGG TGGGGTAGAG* CRCCGGGGCA GAGGGTGGGG GGTGGGCAGC 
7201 TGGCAGGGCA AGCTGAAGGG GTTGTGGAAG CCCCCGGGGA AGAAG AGTTC 
7251. ATGTTACATC AAAGCTCCGA GTCCTGGAGA CTGTGGAACA GGGCCTCTTA 
7301 CCTTCAACTT TCCAG AGCTG" CCTCTGAGGG TACTTTCTGG" AGACCAAGTA 
7351 GTGGTGGTGA TGGGGGAGGG GGTTACTTTG GGAGAAGCGG' ACTGACACCA 
7401 CTCAGACTTC TGCTACCTCC CAGTGGGTGT TCTTTAGCTA TACCAAAGTC 
7451 AGGGATTCTG CCCGTTTTGT TCGAAAG CAC CTACTGAATT TAATATTACA 
750 L TCTGTGTGTT TGTCAGGTTT ATCAATAGGG GCCTTGTAAT ACGATCTGAA 
7551 TGTTTCCTAG CGGATGTTTC TTTTCCAAAG- TAAATCTGAG TTATTAATCC 
7601 TCCAGCATCA TTACTGTGTT" GGAATTTATT TTCCCTTCTG- TAACATGATC 
765 r AACAAGGCGTT GCTCTGTGTT TCTAGGATCG CTGGGGAAAX GTTTGGTAAC. 
7701 ATACTCAAAA GTGGAGAGGG AGAGAGGGTG" GCCCCTCETT. TTCTTTACAA 
775 L CCACtTGTAA AGAAAACTGT* ACACAAAGCC* AAGAGGGGGC TTTAAAAGGG 
7801. GAGTCCAAGG GTGGTGGAGTL AAAAGAGTTG ACACATGGAA. ATTATTAGGC 
7851- ATATAAAGGA GGTTGGGAGA TACTTTCrGT CTTTGGTGTT: TGACAAATGT 
7901 GAGCTAAGTE TTGCTGGTTT: GCTAGCTGCT CCACAACTCT. GCTCCTTCAA 
7951. ATTAAAAGGC ACAGTAATTTT CCTCCCCTTA. GGTTTCTACr ATATAAGCAG. 
8001. AATTCAACCA. ATTCTGCTA3T* TOgriX»m:jl T G TT T CXTGir TTTTGTTTTG: 

8 051. tttggttttt: rrrrrrrrri! Ttr mTr r r gtctcagaaa- agctcatggg: 

8 10 1_ CCETTTCETTI TCCCCTTTCA. ACTGTGCCTJL J GAACATCTGG: AGAACATCCC 

8151. agggaccagt: gagagctctc c mrrcG xnr ccrcrrcAAcr ctcagcagca 

8201 TCCCAGAAAA TGAGGTGAXC TCCTCGGCAGT AGCTCCGGCT CTTTCGGGAG* 

3251 CAGGTGGACC AGGGCCCTGA CTGGG AACAG* GGCTTCCACC GTATAAACAT 

3301 TTATGAGGTT ATGAAGCCCC CAGCAGAAAT GGTTCCTGGA CACCTCATCA 

' S3 51. CACGACTACT GGACACCAGA CTAGTCCATC ACAATGTGAC ACGGTGGGAA 
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GCTGGACTTT 
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TTGACCTTAT 
ATTTTGACAA 



TGAGCCCTGC 
CTGGCCATTG 
GCATGTCAGA 
AACTCCGCCC 
TTGACCCGCA 
CAGGAAGAAG 
GTGACGTGGG 
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TACCTGGATG 
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TATCTTAAAA 
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ATATATTTAT 



AGTCCTTCGC 
AGGTGACTCA 
ATCAGCCGAT 
CCTCCTGGTC 
GGAGGGCCAA 
AATAAGAACT 
CTGGAATGAT 
ATGGGGACTG 
GCCATTGTGC 
CTGTTGTGTC 
AGTATGACAA 
TGTGGATGCC 
CACACACACA 
TACACATACC 
AAAAAAAAAA 
AAAGAAAAAA 
ACGTGCAAAT 
AACTACATAT 



TGGACCCGGG 
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CGTTACCTCA 
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ACGTAGTCCC 
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TGG ATTGTGG 
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AGACCCTAGT 
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GGTGGTGTTG 
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CACACACACA 

ACACAAACTG 

GAAAGAAAGA 

AA AAC CCTAA 

GTTTTGACCA 

TAAAAGAAAA 



AAAAGCAACC 

ACACGGACCC 

AGGGAGTGGA 

ATGATGGCCG 
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TTCACTATAC 
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AAGAAAGAAA 

ACAACTCACC 

TATTGATCAT 

TAAAATGAG 
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bntp2p 

GAATTCATTTAAACTv iTTCACTTCTAGGTCCCATGCGTTTACACl v Ct 

TTCCACCACAAGAGGGCAGCCATCTCTAAAAAAACAAGACT 

TTCAGAGAAATIXjGGCCAAACTTCAGGAAAGTPCCTC^^ 

AGOUSCACCTCTCTGGGCrACAAAAAAJ^^ 

TCGAGTAACTGTCCAGAGGCATCCATTTTACCTCAC^^ 

AGGATATCCTAAACGGCCAAACTCTCTCTTCTGGTC 

AGCIXSC^LRGGCATTGTTGATCTCATCACCAAAGGTT^ 

TCTTCGGGTTGGTCCAACAGCTC 

ACTTTCTCATTTAAATCTCATATAGGTTCGGAGT^ 

TCCGCCTCCGCGATGAGAGAAGCAATGGTTAACTTCTC^ 

TAGGGAAGGAAATGGCTTCAGAGGCGATO^ 

TACACX3TCTGAGTGGAGTGTTTTATTCCOGCCTTGTTTGGTC 
TTCAGAGTCACAACTTCTGCA^ 

ATCGCAAATTGCrGGATCTATCCCTTCCTCTCCTTTAATTTCCCTTC 

ACAGCCTTCCTTCAAAAATACCTTATTrGACCT 

GCCAQGGCCTA ATTTC CCTCT^^ 

AACCTAGAGTTATTTTAGCTCCCC6ACTGAAAAGCTAGCACACG 

AAAAAATCATTAAAGCCCCTGCTTCTGGTCTTTCT 

AAACTGGAAAGATCTGGTTCACAACGTAACGTTATTC^ 

AOVGGAATGCTCAGCCC^TAGTTTTGGGGGTCC^ 

GGTACTATQAAGGCTCCTGAAIXnW3X3GAGAAAT^ 

AGAATCCTGGCTCA^ 

TGGCTTGGCCACAGCOVGAGCCTTACTGC^^ 

GGACCAGGCAGAAAATTCAAAGGTCTCAAACCGGAATTGTCriXj 

GACTCTGGAGTAGGTGGGTGTGGAAGGGAAGATAAATATCACAAGT^ 

AAGTGATQjCTTCTATAAAGAGAATITCTATTAACTCT 

ACATGGACAC?LCAeACACACACA»^ 

GGGMCT 

ATTTTTATAA7TTACATAAATAAATACATATAAAATATATC 

ATTAGATTCATTTATTTGAATATAAATCn^ 

TAATCCACTCAGATGTGTATCGGCTAT^^ 

TTCAAAACAGAAGCGTTTGCTCACATTTTTCCCAAAATG 
GTAMTTCTCTTCTTCTTTTTAATGTGCTC^ 

CAAGTTGAATATTGGCCCAATGAGGGAACTCAGAGGCCAGTG 

ATTTGCCCTAGTCTCCroC^CTGTGGGCGa^ 

CGGCTTCACACTCATCCGGGACGCGACCCC^ 

CCECCCCGCTCCACCGO^CGCCCCGT^^ 

GCGCGCCXSCTCCCGCCa^CCG^ 

GGGGAGGTGTIXX3GCCACGGCCGGGAGGGAGC0GGCAG(^^ 

tTAAAAGCCGCGAGCGCOKXXrCACGGCGCCT^^ 

TCCTCGCCCTGCCGCGCAGAGCCCTGCTCGCACTGO^ 

CGC^CCCACAGCCCGCCCGGGATTGGCAGC^^ 

GGCGACACCAGGCACXX^axXX^ 

CGCGGCTTCGAGGGACTCGCAOGACACGGGTTCGAACTCC^ 
OGCCTGGCGCTGTGGCCTOGGCTGTCCGGGAGAAGCTAGAff 

GACGCTAAGAACCGGGAGTCOGGAGCACAGTCTTACCCTCAATGCGGGOT 
CACTCTGACCCAGGAGTGAGOGCXXaUlGGCGM 

GGACCCCAGGCTGCCACAAAAGACACTTGGCCCGAGGGCT 

GGTCACCCGGTTTTCCAACCCGAGACGCGOGGCTGGACT 

GAGCCCCAGGACGCCGGGGCGCOGCAGCCGTGOGGGCTCTGCTCGCGAG^ 

GCTGATGGGGGTCCGCCAGMTCAGGCTGAGGGATCC^ 

GCCCGCCACCCAGATCTTCGCTGCGCCCT^ 

ACXjATGGCTGCCCCGAGCCATGGGTCGOGGCCCAGCTAA 

OnCCCTCGCCC^CGAGTCCCGGAGCaiGCCCCGC^ 

GGTCCCTGAGGCCGACGACAGCAGCAGCCTTGCCTCAGCC^ 

GTCCCGGCCCCGCACTCCTCCCCCIG 

GGTGGAGACTTCTIXaAACTTGCCGGGAGAGTGACTTCGGC^ 
GCGCCGGTGTCCTCGCCCpGCGGATCC 

Figure 11 
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